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INTRODUCTION 


One  of  our  long-standing  research  interests  is  in  understanding  the  molecular  principles 
that  govern  biological  specificities.  We  have  been  studying  both  protein-protein  interactions  and 
protein-DNA  interactions  through  a  combination  of  different  approaches  including  molecular, 
genetic,  biochemical  and  structural  approaches.  In  particular,  we  have  been  investigating  the 
molecular  functions  and  specificities  of  homeodomain  proteins  (human  PITX2  and  Drosophila 
Bicoid)  that  are  required  for  normal  embryonic  development.  The  analysis  of  molecular 
principles  governing  biological  specificities  is  not  only  important  as  a  basic  science  problem  but 
also  has  implications  in  cancer  prevention.  For  example,  the  anti-cancer  drug  geldanamycin  is  a 
specific  inhibitor  of  the  molecular  chaperone  Hsp90  [2],  Understanding  how  Hsp90  regulates 
the  activities  of  its  client  proteins  in  controlling  normal  development  and  cellular  physiology 
represents  both  a  basic  biological  problem  and  a  medical  interest  in  search  for  improved 
therapeutic  uses  of  existing  anti-cancer  drugs  such  as  geldanamycin.  Our  recent  studies  have 
revealed  a  new  Hsp90  client  protein  (Bicoid)  that  is  involved  in  normal  development, 
establishing  a  foundation  for  further  analysis  of  the  actions  of  the  molecular  chaperone  Hsp90 
and  the  anti-cancer  drug  geldanamycin.  Moreover,  the  knowledge  in  biological  specificities  can 
aid  the  development  of  new  anti-cancer  drugs.  There  are  two  major  challenges  to  the  design  of 
effective  drugs  against  breast  and  other  cancers.  First,  such  drugs  must  have  a  high  specificity 
for  cancer  cells;  this  is  important  for  killing  cancer  cells  while  causing  relatively  little  harm  to 
normal  cells.  Second,  the  mechanisms  of  cancer  cell  destruction  must  be  effective  and  designed 
to  minimize  ways  that  cancer  cells  can  develop  to  escape  the  drugs:  breast  cancer  cells  tend  to 
become  resistant  to  anti-hormone  drugs  after  such  treatments.  We  proposed  to  investigate 
cellular  delivery  methods  for  specifically  targeting  cancer  cell  destruction  using  a  novel  double¬ 
targeting  system.  The  long-term  objective  of  our  work  is  to  understand  the  actions  of  existing 
anti-cancer  drugs  and  to  investigate  concepts  leading  to  the  development  of  new  drugs  with  high 
specificity  and  efficacy  to  eradicate  breast  cancer. 


BODY 


We  have  made  progress  in  understanding  molecular  specificities  in  protein-protein  and 
protein-DNA  interactions.  These  studies  reflect  our  long-standing  research  interests  and  are 
further  detailed  in  the  attached  publications.  Briefly,  the  research  paper  by  Fu  and  Ma  (2005) 
describes  our  analysis  of  the  Drosophila  transcription  activator  protein  Bicoid  (Bed)  and  its 
interaction  with  the  co-factor  dCBP.  Our  results  show  that  dCBP  can  modulate  the  activity  of 
Bed  and  facilitate  the  switch  between  the  active  and  inactive  states  of  Bed  in  activating 
transcription.  The  publications  by  Chaney  et  al.  (2005)  and  Baird-Titus  et  al.  (2006)  describe  our 
collaborative  studies  to  determine  the  solution  structures  of  protein-DNA  complexes  and  to 
understand  the  specificity  codes  in  protein-DNA  interactions.  Our  studies  reveal  that  the 
homeodomains  of  both  human  PITX2  and  Drosophila  Bed  exhibit  novel  structural  properties.  In 
addition,  our  results  reveal  that  the  protein-DNA  interface  responsible  for  determining  the  DNA 
binding  specificity  is  highly  dynamic  in  both  cases.  A  review  article  by  the  PI  on  transcription 
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on-off  switches  (Ma,  2005)  and  a  book  chapter  by  the  PI  on  transcriptional  activators  and 
activation  mechanisms  (Ma,  2006)  are  also  included. 


A.  WT  (48.2+1.6)  B.  HSP83e6D  (52.0+2.3)  c.  HSP83e6A  (47.6+2.5) 


Fig.  1.  Increased  variability  in  hb  expression  caused  by  hsp83 
mutations.  Shown  are  hb  expression  profiles  in  embryos  from  wt  (A) 
or  hsp83+/'  females  (B,  C).  Two  different  alleles  of  hsp83  used  are 
indicated.  In  our  experiments,  hb  expression  was  detected  in  embryos 
by  in  situ  hybridization,  embryo  images  captured  digitally,  staining 
intensity  scanned  and  plotted.  In  these  plots,  the  X-axis  is  the  egg 
length  (0-1.0),  whereas  the  Y-axis  is  the  staining  intensity  (normalized 
for  each  embryo  to  have  values  between  0-1.0).  Each  curve  in  the 
plots  represents  the  scanned  profile  of  one  individual  embryo.  The 
mean  value  of  the  hb  expression  boundary  (embryonic  location  with 
1/2  maximal  hb  expression)  and  the  standard  deviation  are  listed  at  the 
top  of  each  plot. 


IP  anti-HA  Input 


As  part  of  our  long-term 
objective  toward  understanding 
and  improving  existing  anti¬ 
cancer  drugs,  we  recently 
investigated  the  roles  of  the 
molecular  chaperone  Hsp90  in 
normal  development  in 
Drosophila.  Hsp90  is  a  specific 
target  of  the  anti-cancer  drug 
geldanamycin  [2],  In  Drosophila, 

Hsp90  is  encoded  by  the  gene 
Hsp83  [3].  Our  recent  unpublished 
experiments  have  revealed  an 
interaction  between  Bed  and 
Hsp90,  suggesting  that  Bed  is  a 
new  client  protein  of  Hsp90.  In 
early  Drosophila  embryos,  the 
maternally-contributed  Bed 
protein  is  distributed  as  an 
anterior-to-posterior  gradient  that  is 
responsible  activating  its  target  genes  such 
as  hunchback  (hb)  [4].  As  reported 
previously  [5]  and  confirmed  by  our  data 
(Fig.  1A),  the  Bed-dependent  expression  of 
hb  in  the  anterior  half  of  the  embryo  is 
highly  precise,  with  a  standard  deviation  of 
its  expression  border  of  1.6%.  However,  in 
embryos  with  a  reduced  maternal 
contribution  of  the  Hsp83  activity,  the  hb 
expression  profile  exhibits  an  increased 
variability  with  a  standard  deviation  of 
between  2.3  and  2.5%  (Fig.  IB,  C). 
Furthermore,  the  antimophic  allele  e6D  of 
Hsp83  causes  the  hb  expression  boundary  to 
be  expanded  toward  the  posterior,  from 
48.2%  to  52.0%  in  egg  length  (Fig.  IB). 
These  results  suggest  that  Hsp90  plays  a  role 
in  regulating  the  activity  of  Bed  in  vivo. 


Bed  -  +  +  -  + 

Mo04  -  -  +  -  - 


1  2  3  4  5 

Fig.  2.  Co-IP  experiments  detecting  Bcd-Hsp90 
interaction.  Shown  are  the  results  of  co-IP  experiments 
showing  that  Hsp90  is  co-precipitated  by  an  antibody 
(HA)  that  precipitates  a  tagged  wt  Bed  protein  in  the 
nuclear  extracts  of  S2  cells  expressing  the  Bed  protein. 
Western  blot  shown  here  was  detected  by  a  monoclonal 
anti-Hsp90  antibody  (top)  or  anti-HA  (bottom)  to  detect 
the  tagged  wt  Bed  protein.  Molybdate  (M0O4)  was  also 
included  in  the  co-IP  experiment  for  lane  3.  Input 
represents  10%  of  the  extracts  used  in  co-IP. 


To  determine  whether  Bed  and  Hsp90  can  interact  with  each  other  physically,  we  carried 
out  co-IP  experiments  in  Drosophila  S2  cells.  As  shown  in  Fig.  2,  Hsp90  is  specifically  co- 
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precipitated  by  an  antibody  against  the  HA  tag  that  is  attached  to  the  wild  type  Bed  protein  (lane 
2).  In  the  absence  of  Bed,  no  Hsp90  is  pulled  down  (lane  1).  We  have  also  analyzed  the  effect 
of  molybdate,  another  Hsp90  inhibitor  that  can  "lock"  or  "freeze"  the  interactions  between  Hsp90 
and  some  of  its  client  proteins  [6].  Our  results  show  that  molybdate  does  not  affect  the  Bcd- 
Hsp90  interaction  (lane  3),  indicating  that  this  is  a  high  affinity  binding. 

To  determine  whether  Hsp90  can  affect  the  activation  function  of  Bed,  we  performed  a 
transient  reporter  assay  in  S2  cells.  In  this  experiment,  the  activity  of  wt  Bed  on  a  Bed- 
responsive  hb-CAT  reporter  was  determined  in  the  presence  or  absence  of  the  Hsp90  inhibitor 
geldanamycin  (Fig.  3).  Our  results  reveal  an  effect  of  geldanamycin  that  is  dependent  on  Bed 
concentration:  it  increases  Bed  activity  at  low  Bed  concentrations  (lanes  3-10),  but  reduces  its 
activity  at  high  concentrations  (lanes  13-16).  Geldanamycin  has  no  effect  on  reporter  activity 
without  Bed  (lanes  1-2).  These  results 
suggest  that  Hsp90  regulates  Bed  activity 
negatively  at  low  Bed  concentrations 
(consistent  with  the  posterior  shift  caused  by 
hsp83e6D  in  embryos;  Fig.  2B)  but  positively 
at  high  Bed  concentrations  (consistent  with 
the  notion  that  the  molecular  chaperone 
Hsp90  may  reduce  misfolding  of  Bed  protein 
at  high  concentrations).  Together,  these  new 
findings  provide  the  first  demonstration  that 
Hsp90  and  Bed  can  interact  with  each  other. 

They  also  establish  a  foundation  for  further 
investigating,  with  the  powerful  genetic  tools 
available,  the  molecular  actions  of  Hsp90 
and  its  inhibitors  including  the  anti-cancer 
drug  geldanamycin. 

We  proposed  to  design  methods  to  deliver  into  cells  proteins  that  can  cause  cell 
destruction.  The  design  utilizes  the  properties  of  anthrax  toxin.  This  toxin  contains  three 
components:  two  enzymatic  proteins  lethal  factor  (LF)  and  edema  factor  (EF),  and  the  protective 
antigen  (PA),  which  is  responsible  for  translocating  both  LF  and  EF  into  the  cytosol  [7].  Such  a 
translocation  activity  of  PA  is  strictly  dependent  on  its  cleavage  by  cell-surface  furin  or  furin-like 
proteases  [7].  A  recent  study  has  shown  that  the  replacement  of  the  furin  cleavage  site  with  a 
urokinase  plasminogen  activator  (uPA)  target  site  can  alter  the  cleavage/activation  specificity  of 
PA  [8].  Such  an  engineered  PA  is  cleaved/activated  by  uPA,  which,  along  with  its  receptor 
(uPAR),  is  highly  expressed  in  malignant  cells  [9].  Our  design  is  to  use  such  an  altered- 
specificity  PA  of  anthrax  toxin  to  translocate  cell-destructing  proteins  that  are  linked  to  the 
translocation  domain  of  LF  [1],  To  further  increase  cancer  cell  specificity,  cell  destructing 
proteins  are  also  linked  to  different  EGF-like  domains  [10]:  growth  factor  receptors,  such  as  the 
EGFR/ErbB  family  members,  are  overexpressed  in  malignant  cells.  This  double-targeting 
approach  is  designed  to  further  enhance  the  specificity  for  cancer  cells. 


Bed  (^g):  0.00  0.02  0.04  0.06  0.08  0.10  0.30  1.00 
Fold  change:  0.98  1.56  1.71  2.50  1.26  1.04  0.72  0.35 

Fig.  3.  Reporter  assay  in  S2  cells.  A  hb-CAT  reporter 
(lpg)  was  co-transfected  with  different  amounts  of  a 
Bed-expressing  plasmid  in  the  presence  or  absence  of 
geldanamycin  (GA).  CAT  activity  at  lp,g  Bed- 
expressing  plasmid  without  GA  (lane  15)  is  set  at  100%. 
The  effect  of  GA  (fold  change)  is  listed. 
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Caspases  can  directly  cause  cell  destruction  through  protein  degradation  and  programmed 
cell  death  [11].  However,  there  are  several  challenges  to  the  use  of  these  proteins  in  the  cellular 
delivery  system  outlined  above.  These  enzymes  are  expressed  as  inactive  precursors  until  they 
are  specifically  activated  through  proteolytic  processing  and,  moreover,  the  activation  of  several 
of  these  enzymes  (e.g.,  caspase-2,  -8,  -9  and  -10)  requires  coordinated  actions  of  additional 
cellular  co-factors  [11,  12].  These  issues  complicated  our  original  design  of  using  caspase-9  and 
hampered  our  progress  toward  successfully  establishing  a  cellular  delivery  system  for  cell 
destruction.  In  the  future  we  will  seek  to  investigate  alternative  possibilities,  one  of  which  is  the 
use  of  activated  forms  of  other  caspases.  It  has  been  shown  [13]  that  caspase-6  and  -3  can  be 
expressed  as  re-arranged  A  Prccursor 

enzymes  that  are  constitutively  j- 

active  without  requiring  r 


SS  PD 


proteolytic  processing  (Fig.  4A, 
B).  Engineered  hybrid  proteins 
(Fig.  4C)  can  be  expressed  from 
engineered  gene  constructs  and 
tested  in  cells.  In  these  hybrid 
proteins,  the  aspartate  processing 
sites  in  the  linker  regions  of  the 
re-arranged  caspases  needs  to  be 
mutated  to  prevent  auto¬ 
cleavage;  such  mutations  have 
been  shown  not  to  affect  the 
caspase  enzyme  activity  in 
degrading  target  proteins  [13]. 
Although  the  effectiveness  of 
the  use  of  hybrid  proteins  needs 
to  be  validated  experimentally, 
such  a  design  may  represent  an 
important  strategy  toward 
efficiently  and  specifically 
eradicating  breast  cancer. 


^  Processing 


Linker 

Re-arranged  active  form 


Active  form 


C. 


H 

LS  | 

LF  (1-254)  D9A 

D28A 

EGF-like  domain 


Re-arranged  active  form  of  caspase-3 


Fig.  4.  Use  of  re-arranged  active  form  of  capases  in  a  cellular  delivery 
system.  A.  Shown  is  the  processing  of  caspase-3,  where  the  two  subunits 
(LS  and  SS)  in  the  active  form  of  the  enzyme  are  marked.  Also  marked 
are  the  aspartate  (D)  processing  sites;  PD,  prodomain.  B.  Re-arranged, 
constitutively  active  form  of  caspase-3  that  does  not  require  processing. 
C.  Hybrid  protein  for  cellular  delivery  tests.  The  aspartate  processing 
sites  in  the  linker  region  of  the  re-arranged  capspase-3  will  be  mutated. 
The  hybrid  protein  also  contains  the  domain  of  anthrax  LF  (residues  1- 
254)  that  is  sufficient  for  cellular  translocation  [1]  and  the  EGF-like 
domain  for  cancer  cell  targeting.  Not  shown  are  protein  tags  that  can 
facilitate  protein  purification.  Diagrams  not  drown  to  scale. 


KEY  RESEARCH  ACCOMPLISHMENTS 

-Further  understanding  of  molecular  principles  determining  biological  specificity 

-Analysis  of  protein-DNA  interactions  in  biological  specificity 

-Analysis  of  protein-protein  interactions  in  biological  specificity /activity 

-Interaction  between  an  anti-cancer  drug  target  (Hsp90)  and  a  developmental  protein  (Bed) 

-Identifying  challenges  toward  design  of  effective  cancer  therapeutics 

-Concept  of  a  double-targeting  approach  for  increased  specificity 

-Selection  of  constitutively  active  forms  (re-arranged)  of  caspases  for  cellular  delivery 
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CONCLUSIONS 

The  understanding  of  the  molecular  principles  governing  biological  specificities 
represents  not  only  a  basic  scientific  problem  but  also  a  critical  task  in  conquering  human 
diseases  including  breast  and  other  cancers.  Our  recent  finding  that  Bed  and  Hsp90  can  interact 
with  each  other  establishes  a  foundation  for  further  investigating  the  molecular  actions  of  Hsp90 
and  its  specific  inhibitor,  the  anti-cancer  drug  geldanamycin.  The  knowledge  in  biological 
specificities  can  also  aid  the  design  of  therapeutics  of  treating  cancers  including  the  use  of 
cellular  delivery  systems  with  a  double-targeting  mechanism.  Our  long-term  objective  is  to 
understand  the  actions  of  existing  anti-cancer  drugs  and  to  investigate  new  concepts  leading  to 
the  development  of  new  drugs  to  eradicate  breast  cancer. 
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ABSTRACT 

The  Drosophila  mophogenetic  protein  Bicoid 
(Bed)  can  activate  transcription  in  a  concentration- 
dependent  manner  in  embryos.  It  contains  a  self- 
inhibitory  domain  that  can  interact  with  the 
co-repressor  Sin3A.  In  this  report,  we  study  a  Bed 
mutant,  Bcd(A57-61),  which  has  a  strengthened 
self-inhibitory  function  and  is  unable  to  activate 
the  hb-CAT  reporter  in  Drosophila  cells,  to  analyze 
the  role  of  co-factors  in  regulating  Bed  function. 
We  show  that  increased  concentrations  of  the  co¬ 
activator  dCBP  in  cells  can  switch  this  protein  from 
its  inactive  state  to  an  active  state  on  the  hb-CAT 
reporter.  The  C-terminal  portion  of  Bcd(A57-61)  is 
required  to  mediate  such  activity-rescuing  function 
of  dCBP.  Although  capable  of  binding  to  DNA 
in  vitro,  Bcd(A57-61)  is  unable  to  access  the  hb 
enhancer  element  in  cells,  suggesting  that  its  DNA 
binding  defect  is  only  manifested  in  a  cellular  con¬ 
text.  Increased  concentrations  of  dCBP  restore  not 
only  the  ability  of  Bcd(A57-61)  to  access  the  hb 
enhancer  element  in  cells  but  also  the  occupancy  of 
the  general  transcription  factors  TBP  and  TFIIB  at  the 
reporter  promoter.  These  and  other  results  suggest 
that  an  activator  can  undergo  switches  between  its 
active  and  inactive  states  through  sensing  the  oppos¬ 
ing  actions  of  positive  and  negative  co-factors. 

INTRODUCTION 

Regulation  of  gene  transcription  plays  a  critical  role  in  many 
biological  processes  that  range  from  cell  growth  and  differ¬ 
entiation  to  embryonic  patterning  (1,2).  Genes  that  participate 
in  these  biological  processes  need  to  be  specifically  turned  on 


or  off  by  transcription  factors  at  the  appropriate  time  and 
location.  It  is  becoming  increasingly  clear  that  many  trans¬ 
cription  factors  can  act  as  both  activators  and  repressors 
in  a  context-dependent  manner  [reviewed  in  (3)].  Promoter/ 
enhancer  architecture  and  cellular  levels  of  other  proteins  have 
been  suggested  to  play  roles  in  influencing  a  transcription 
factor’s  regulatory  functions,  but  the  precise  mechanisms  in 
most  cases  remain  largely  unclear.  For  proteins  that  can  work 
as  both  activators  and  repressors,  they  have  three  distinct 
activity  states:  active,  repressive  and  inactive  (neither  active 
nor  repressive).  In  contrast,  for  proteins  that  work  only  as  acti¬ 
vators,  such  as  the  Drosophila  protein  Bicoid  (Bed),  they  only 
have  two  activity  states:  active  and  inactive.  Analysis  of  these 
proteins  can  thus  help  us  understand  the  important  question 
of  how  the  simple  on-off  switches  of  activator  activities  are 
achieved.  Bed  is  a  well-documented  protein  that  undergoes 
such  on-off  activity  switches  in  a  concentration-dependent 
manner  (see  below).  The  experiments  described  here  suggest 
another  mechanism  in  which  the  opposing  actions  of  positive 
and  negative  co-factors  can  facilitate  Bed  to  switch  between 
its  active  and  inactive  states  in  a  manner  that  is  independent 
of  Bed  concentration. 

Bed  is  a  molecular  morphogen  that  plays  a  critical  role  in 
patterning  embryonic  structures,  including  the  head  and  thorax 
(4,5).  This  489  amino  acids  transcription  factor  contains  a 
homeodomain  (residues  92-151)  in  its  N-terminal  portion  (6). 
Bed,  which  is  distributed  in  the  early  embryo  as  an  anterior- 
to-posterior  gradient,  is  responsible  for  activating  specific  tar¬ 
get  genes  in  a  concentration-dependent  manner.  For  example, 
orthodenticle  ( otd ),  hunchback  (hb)  and  knirps  (kni)  are  direct 
Bed  target  genes  that  are  required  for  patterning  the  head, 
thoracic  and  abdominal  structures,  respectively  (7).  These 
genes  are  expressed  in  distinct  parts  of  the  embryo  by  respond¬ 
ing  to  different  Bed  concentrations  (8-10).  Bed  has  the  ability 
to  bind  DNA  in  a  highly  cooperative  manner  (11-14),  and  it 
has  been  suggested  that  the  affinity  of  Bed  binding  sites  in  an 
enhancer  can  determine  the  concentration  of  Bed  required 


*To  whom  correspondence  should  be  addressed.  Tel:  +1  513  636  7977;  Fax:  +1  513  636  4317;  Email:  jun.ma@cchmc.org 
Present  address: 

Dechen  Fu,  Department  of  Molecular  and  Cellular  Biology,  University  of  California,  Berkeley,  CA  94720,  USA 
©  The  Author  2005.  Published  by  Oxford  University  Press.  All  rights  reserved. 

The  online  version  of  this  article  has  been  published  under  an  open  access  model.  Users  are  entitled  to  use,  reproduce,  disseminate,  or  display  the  open  access 
version  of  this  article  for  non-commercial  purposes  provided  that:  the  original  authorship  is  properly  and  fully  attributed;  the  Journal  and  Oxford  University  Press 
are  attributed  as  the  original  place  of  publication  with  the  correct  citation  details  given;  if  an  article  is  subsequently  reproduced  or  disseminated  not  in  its  entirety  but 
only  in  part  or  as  a  derivative  work  this  must  be  clearly  indicated.  For  commercial  re-use,  please  contactjournals.permissions@oupjournals.org 


3986  Nucleic  Acids  Research,  2005,  Vol.  33,  No.  13 


for  activating  transcription  (9,15,16).  Our  recent  studies  sug¬ 
gest  that  the  arrangements  of  Bed  binding  sites  in  an  enhancer 
can  also  play  a  critical  role  in  regulating  the  activity  of  Bed  and 
contributing  to  its  concentration-dependent  action  (17). 

CBP  is  a  co-activator  that  interacts  with  many  transcription 
factors  and  participates  in  the  activation  process  (18,19). 
Its  histone  acetyltransferase  (HAT)  enzymatic  activity  is 
thought  to  alter  chromatin  structure  by  acetylating  the  histone 
tails  thus  increasing  the  accessibility  of  DNA  for  both  gene- 
specific  transcription  factors  and  general  transcription  factors 
(GTFs).  CBP  can  also  play  a  structural  role  by  bridging 
between  transcription  factors  and  GTFs  or  by  recruiting 
other  HAT  activities  (18,19).  In  Drosophila ,  dCBP  has  been 
shown  to  be  a  co-activator  for  Ci  (20),  Mad  (21)  and  Dorsal 
(22).  dCBP  also  plays  a  role  in  facilitating  Bed  to  activate 
transcription  (23).  dCBP  and  Bed  can  interact  with  each  other 
through  distinct  domains  on  different  enhancers.  In  particular, 
on  the  hb  enhancer  element  the  C-terminal  portion  of  Bed 
plays  an  important  role  in  responding  to  the  co-activation 
function  of  dCBP,  whereas  on  the  kni  enhancer  element,  the 
N-terminal  domain  plays  an  important  role  (23). 

In  addition  to  its  ability  to  interact  with  co- activators,  such 
as  dCBP,  Bed  can  also  interact  with  co-repressors.  An  analysis 
of  the  N-terminal  region  of  Bed  revealed  a  self-inhibitory 
domain  (residues  52-91)  that  can  dramatically  inhibit  the  abil¬ 
ity  of  Bed  to  activate  transcription  (24).  For  example,  on  the 


hb-CAT  reporter  gene  which  contains  the  Bed-responsive 
hb  enhancer  element,  a  Bed  derivative  lacking  the  entire 
N-terminal  domain,  Bcd(92-489),  exhibits  an  activity  40 
times  higher  than  the  full-length  protein  in  Drosophila  S2 
cells.  A  systematic  analysis  of  the  self-inhibitory  domain 
identified  a  10  amino  acid  motif  (residues  52-61)  that  is 
most  critical  for  the  self-inhibitory  function.  Interestingly, 
mutations  of  different  residues  in  this  motif  can  cause  dras¬ 
tically  opposing  effects  (25).  In  particular,  the  mutant  protein 
Bcd(A52-56),  which  has  residues  52-56  changed  to  alanines, 
is  25  times  more  active  than  wt  Bed  on  the  hb-CAT  reporter 
in  S2  cells.  In  contrast,  on  the  same  reporter  another  mutant, 
Bcd(A57-61),  which  has  the  neighboring  five  amino  acids 
changed  to  alanines,  is  virtually  inactive  (<2%  of  wt  Bed 
activity)  at  all  concentrations.  The  co-repressor  Sin3A  has 
been  shown  to  interact  with  the  evolutionarily  conserved 
N-terminal  domain  of  Bed,  and  it  is  proposed  that  mutations 
that  alter  the  10  amino  acid  motif  can  weaken  or  strengthen 
this  interaction,  thus  increasing  or  decreasing,  respectively, 
the  activity  of  Bed  (25).  Another  component  of  the  Sin3A- 
HDAC  (histone  deacetylase)  complex,  SAP18,  has  also  been 
shown  to  interact  with  Bed,  apparently  through  multiple  Bed 
domains  [(24,26);  see  Figure  1A  for  a  schematic  diagram  of 
Bed  domains  interacting  with  co-factors]. 

In  this  report,  we  use  Bcd(A57-61),  an  inactive  protein 
on  the  hb-CAT  reporter  in  S2  cells,  as  a  tool  to  analyze 
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Figure  1.  Exogenous  dCBP  switches  the  activity  states  of  Bcd(A57-61)  in  S2  cells.  (A)  Shown  is  a  schematic  diagram  of  Bed  and  its  interacting  domains  with 
co-factors.  The  homeodomain  (residues  92-151)  of  the  489  amino  acid  Bed  protein  is  marked  with  a  black  box  and  the  two  neighboring  mutations  discussed  in  this 
report,  A52-56  and  A57-61 ,  are  each  marked  with  an  ‘X’ .  The  interaction  information  in  this  diagram  is  based  on  (25)  for  Sin3A,  (24,26)  for  SAP1 8  and  (23)  for  CBP. 
The  diagram  is  not  drawn  to  scale.  (B)  Shown  are  CAT  assay  results  in  S2  cells  that  were  transfected  with  the  reporter  plasmid  hb-CAT  (1  gg),  the  indicated 
amounts  of  effector  plasmids  expressing  wt  Bed  or  Bcd(A57-61),  with  (+)  or  without  (— )  another  effector  plasmid  (5  gg)  expressing  dCBP.  Fold  activation  by  wt  Bed 
(at  1  gg  transfected  DNA)  without  exogenous  dCBP  was  set  to  100. 
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the  interplay  between  positive  and  negative  co-factors  in 
regulating  Bed  function.  This  mutant  exhibits  some  special 
properties.  In  particular,  while  it  is  inactive  on  the  hb-CAT 
reporter  gene,  it  can  activate  another  reporter  gene,  kni-CAT 
(17).  These  and  other  findings  suggest  that  the  activity  state 
of  this  protein  is  intricately  controlled,  and  we  sought  to  gain 
a  better  understanding  of  this  mutant  protein  by  focusing 
on  the  roles  of,  and  the  interplay  between,  Bed  interacting 
co-factors.  In  this  report,  we  show  that  increased  concentra¬ 
tions  of  dCBP  in  S2  cells  can  switch  this  protein  from  an 
inactive  state  to  an  active  one  on  the  hb-CAT  reporter.  We 
further  show  that  the  C-terminal  domain  of  Bcd(A57-61) 
mediates  such  activity-rescuing  function  of  dCBP.  We  provide 
evidence  demonstrating  that,  despite  its  normal  DNA  binding 
ability  in  vitro ,  Bcd(A57-61)  fails  to  occupy  the  hb  enhancer 
element  in  cells.  High  levels  of  dCBP  in  S2  cells  restore  the 
ability  of  Bcd(A57-61)  to  access  the  hb  enhancer  element  and 
enable  this  Bed  derivative  to  recruit  the  GTFs  TBP  and  TFIIB 
to  the  target  promoter.  We  also  provide  evidence  suggesting 
that  dCBP  may  negatively  affect  the  interaction  between  Bed 
and  Sin3A  in  cells.  Together,  these  results  demonstrate  that 
dCBP  plays  an  important  role  in  regulating  Bed  function  in 
a  dCBP  concentration-dependent  manner.  They  suggest  that 
the  opposing  actions  of  positive  and  negative  co-factors  can 
facilitate  Bed  to  switch  between  its  active  and  inactive  states 
in  a  manner  that  is  independent  on  Bed  concentration. 

MATERIALS  AND  METHODS 

Plasmid  construction 

Plasmids  expressing  Bed  derivatives  were  generated  in  two 
steps  as  described  previously  (23).  The  bed  gene  was  first 
modified  on  pFY441,  a  pGEM3-based  plasmid  containing 
wt  bed  linked  to  the  coding  sequence  of  the  hemagglutinin 
(HA)  tag,  and  then  transferred  to  pFY442,  a  plasmid  express¬ 
ing  HA-tagged  wt  Bed  from  the  Drosophila  actin  5C  promoter 
(24).  ForBcd(l-246;  A57-61),  the  pGEM3-based  plasmid  was 
pDF333  and  the  expression  plasmid  was  pFD347.  Reporter 
genes  and  the  effector  plasmids  pFY443  [Bcd(  1-246)]  and 
pFY465  [Bcd(A57-61)]  have  been  described  previously 
(14,24).  The  expression  plasmids  of  wt  and  mutant  dCBP 
were  kindly  provided  by  Dr  S.  Smolik. 

Transient  transfection  assays 

Drosophila  S2  cells  were  transfected  with  plasmids  by  the 
calcium  phosphate  co-precipitation  method  as  described  by 
Invitrogen.  The  total  amount  of  DNA  in  each  transfection 
was  adjusted  to  10  qg  by  salmon  sperm  DNA.  In  order  to 
monitor  the  transfection  efficiency,  1  jig  control  plasmid 
pCopia-lacZ  was  co-transfected  in  each  experiment,  and 
both  CAT  assays  and  western  blot  analyses  were  normalized 
according  to  the  (3-galactosidase  activity.  CAT  activity  was 
measured  as  previously  described  by  using  three  independ¬ 
ently  transfected  samples  for  each  experiment  (14).  The  pro¬ 
tein  levels  of  Bed  derivatives  with  or  without  dCBP  were 
detected  by  western  blot  using  anti-HA  antibody  (1:500 
final  dilution,  Babco).  Double-strand  RNA  against  endogen¬ 
ous  dCBP  in  S2  cells  was  generated  as  described  previously 
(23),  and  the  RNAi  treatment  did  not  affect  the  accumulation 
of  Bed  in  S2  cells. 


Gel  shift  assays 

The  hb  enhancer  probe  for  gel  shift  experiments  was  released 
from  a  plasmid  and  filled-in  with  Klenow  in  the  presence  of 
[a-32P]dCTP  as  described  previously  (17).  Wild-type  Bed  and 
its  derivative  used  in  these  assays  were  expressed  in  vitro  by 
using  the  TnT  quick-coupled  transcription/translation  system 
(Promega).  The  experimental  procedures  and  conditions  for 
gel  shift  assays  were  described  previously  (17). 

Chromatin-immunoprecipitation  (ChIP)  assays 

ChIP  assays  were  performed  according  to  Fu  et  al.  (23).  The 
presence  of  Bed,  GTFs  and  acetylated  histones  at  the  hb 
enhancer-core  promoter  region  was  detected  by  PCR  using 
primers  hb- core5  and  hb-CAT3  as  described  previously  (23). 

Co-immunoprecipitation  (Co-IP) 

Co-IP  experiments  were  performed  as  described  previously 
(23).  Briefly,  nuclear  extracts  prepared  from  S2  cells  were 
incubated  with  anti-HA  antibody  (1:100  final  dilution)  in  IP 
buffer  (20  mM  Tris-HCl,  pH  8.0,  160  mM  MgC12,  0.1% 
Nonidet  P-40  and  10%  glycerol).  The  precipitated  products 
were  resolved  by  S AS-PAGE  gel  and  detected  by  western  blot 
using  anti-Sin3A  antibodies  [kindly  provided  by  Drs  Lori  Pile 
and  David  Wassarman  (27)].  Quantitation  of  the  co-IP  data 
shown  in  Figure  5  was  conducted  as  follows.  The  intensities  of 
input  and  co-IP  Sin3A  bands  for  each  sample  were  measured 
to  obtain  an  intensity  ratio  of  co-IP  product  over  input.  For 
each  experiment,  the  ratio  for  Bed  transfection  alone  (lane  4) 
was  arbitrarily  set  to  100  to  allow  comparison  of  data  from 
independent  experiments. 

RESULTS 

High  levels  of  dCBP  switch  Bcd(A52-61)  to  an  active 
state 

Our  previous  transfection  experiments  in  S2  cells  have  shown 
that  Bcd(A57-61)  has  a  strengthened  self-inhibitory  function 
and  is  nearly  completely  inactive  on  the  hb-CAT  reporter 
gene  (17,25).  This  mutant  protein  is  stably  accumulated  in 
cells  (25),  suggesting  that  its  inability  to  activate  hb-CAT 
reflects  a  distinct  functional  state  of  this  protein  rather  than 
its  defects  in  protein  stability.  Unlike  wt  Bed,  which  exhibited 
a  dose-dependent  activation  function  in  transfection  assays 
(Figure  IB,  solid  line),  this  mutant  protein  failed  to  activate 
hb-CAT  at  all  concentrations  tested  (Figure  IB,  dashed  line, 
bottom).  To  determine  whether  high  concentrations  of  the 
co-activator  dCBP  might  counteract  the  strengthened  self- 
inhibitory  function  and  switch  Bcd(A57-61)  to  an  active 
state,  we  conducted  co-transfection  experiments.  In  these 
experiments,  the  ability  of  Bcd(A57-61)  to  activate  hb-CAT 
was  measured  in  the  presence  or  absence  of  dCBP  exogen¬ 
ously  expressed  from  a  transfected  plasmid. 

Our  results  showed  that  exogenous  dCBP  dramatically 
rescued  the  activity  of  Bcd(A57-61)  on  hb-CAT ,  increasing 
its  activity  by  29  to  110  fold  depending  on  Bed  concentration 
(Figure  IB,  dashed  line,  top;  also  see  Table  1).  In  the  presence 
of  exogenous  dCBP,  the  activity  of  Bcd(A57-61)  at  several 
concentrations  was  higher  than  wt  Bed  at  its  saturating  con¬ 
centrations  (without  exogenous  dCBP).  Table  1  lists  the  effect 
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of  dCBP  on  wt  Bed  and  Bcd(A57-61),  further  indicating  that 
Bcd(A57-61)  responds  to  dCBP  much  more  robustly  than  wt 
Bed  does  at  all  concentrations  tested.  As  shown  previously, 
exogenous  dCBP  has  no  effect  on  reporter  gene  expression 
in  the  absence  of  Bed  and  does  not  alter  the  amount  of  Bed 
protein  in  cells  (23).  Together,  these  results  suggest  that 
dCBP  is  a  limiting  co-factor  for  Bcd(A57-61)  and  is  capable 
of  making  this  Bed  protein  to  switch  between  its  inactive  and 
active  states  on  the  hb-CAT  reporter  in  cells. 

Rescue  of  Bcd(A56-61)  activity  by  dCBP  requires 
the  C-terminal  domain  of  Bed 

It  has  been  shown  that  Bed  and  dCBP  can  physically  interact 
with  each  other  (23).  Deletion  analysis  further  suggested 
that  the  C-terminal  half  of  Bed  plays  an  important  role  in 


Table  1.  The  effect  of  dCBP  on  wt  Bed  and  Bcd(A57-61) 


DNA  transfected  (gg) 

Effect  of  dCBP  (fold  increase) 

wt  Bed 

Bcd(A57-61) 

0.01 

17 

58 

0.03 

19 

72 

0.1 

4.8 

110 

0.3 

4.7 

61 

1.0 

7.5 

29 

Listed  is  the  effect  (fold  increase)  of  dCBP  on  wt  Bed  and  Bcd(A57-61)  in 
activating  the  hb-CAT  reporter  gene  in  S2  cells.  The  amount  of  transfected 
DNA  refers  to  the  plasmids  expressing  the  Bed  derivatives.  The  data  for  wt 
Bed  and  Bcd(A57-61)  are  from  Fu  et  al.  (23)  and  Figure  IB,  respectively. 


responding  to  the  co-activator  function  of  dCBP  on  the 
hb-CAT  reporter  (23).  To  determine  whether  this  domain 
is  required  for  mediating  the  activity-rescuing  function  of 
dCBP,  we  analyzed  the  effect  of  exogenous  dCBP  on  a 
truncated  derivative  of  Bed,  Bcd(  1-246).  Two  versions  of 
Bcd(  1-246),  with  either  wt  or  the  A57-61  mutation  at  its 
N- terminus,  were  used  in  the  experiments.  As  shown  previ¬ 
ously  (23),  the  truncated  derivative  Bcd(  1-246)  responded  to 
dCBP  modestly  (Figure  2A).  However,  dCBP  failed  to  rescue 
the  activity  of  the  truncated,  mutant  protein  Bcd( 1-246; 
A57-61)  at  all  concentrations  tested  (Figure  2A).  dCBP  did 
not  affect  the  accumulated  levels  of  the  Bed  proteins 
(Figure  2B).  These  results  suggest  that  dCBP  rescues  the 
activity  of  Bcd(A57-61)  through  the  C-terminal  domain 
of  Bed. 

Defect  of  Bcd(A57-61)  in  hb  enhancer  recognition 
in  cells  but  not  in  vitro 

Bcd(A57-61)  has  a  normal  ability  to  bind  to  a  single  TAATCC 
site  when  analyzed  in  vitro  (25).  To  determine  whether  this 
mutant  protein  might  be  defective  in  recognizing  natural 
enhancer  elements  that  contain  multiple  Bed  binding  sites, 
we  conducted  gel  shift  studies  using  the  hb  enhancer  element. 
As  shown  previously  (17),  wt  Bed  bound  to  this  enhancer 
element  in  a  cooperative  manner,  forming  protein-DNA 
complexes  that  contained  multiple  Bed  molecules  (Figure  3, 
lanes  1-4).  Our  gel  shift  experiments  using  Bcd(A57-61) 
showed  that  this  mutant  protein  can  bind  to  the  hb  enhancer 
element  in  a  manner  comparable  with  the  wt  Bed  protein 


0.01  0.03  0.1  03  1.0 


Bed  DNA  transfected  (/ig) 


B 


HA- Bed 


1  2 

dCBP  -  + 

I _ 

Bed  (1-246) 


3  4 

+ 

I _ I 
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Figure  2.  Switch  of  Bcd(A57-61)  activity  states  by  dCBP  requires  Bed  C-terminal  domain.  (A)  Shown  are  CAT  assay  results  in  S2  cells  that  were  transfected  with 
the  reporter  plasmid  hb-CAT  (1  gg),  the  indicated  amounts  of  effector  plasmids  expressing  two  different  Bed  derivatives,  with  (+)  or  without  (— )  the  effector 
plasmid  (5  gg)  expressing  dCBP.  The  two  Bed  derivatives  are  Bcd(l-246),  a  truncated  Bed  with  a  wt  N-terminus;  Bcd(l-246;  A57-61),  a  truncated  derivative  with 
the  A57-61  mutation  in  its  self-inhibitory  domain.  Fold  activation  for  each  assay,  measured  by  CAT  activity,  is  shown  in  the  figure.  (B)  Western  blot  data  showing  the 
HA-tagged  Bed  protein  levels  (1  gg  transfected  DNA)  in  the  presence  (+)  or  absence  (— )  of  dCBP  (5  gg  transfected  DNA). 
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(Figure  3,  lanes  5-8).  Bcd(A57-61)  also  bound  to  another 
natural  enhancer  element,  kni ,  in  a  cooperative  manner  similar 
to  the  wt  Bed  protein  in  vitro  (data  not  shown).  These  results 
further  support  our  conclusion  that  the  A57-61  mutation  of 
Bed  does  not  abolish  the  protein’s  ability  to  recognize  DNA 
in  vitro  (25). 


Bed  (Wt)  Bed  (A57-61) 


1  2  3  4  5  6  7  8 


hb 

Figure  3.  Enhancer  element  binding  by  Bcd(A57-61)  in  vitro.  Gel  shift  data 
showing  DNA  binding  to  the  hb  enhancer  element  by  wt  Bed  (lanes  1-4) 
and  Bcd(A57-61)  (lanes  5-7).  Free  probe  is  indicated  by  an  arrowhead 
(bottom  right).  In  the  absence  of  Bed,  there  were  no  shifted  complexes  detected 
(data  not  shown). 


To  further  dissect  the  defects  of  Bcd(A57-61),  thus  helping 
understand  the  mechanisms  of  the  functional  rescue  by  dCBP, 
we  carried  out  a  ChIP  analysis  in  cells.  We  compared  the 
occupancy  of  wt  Bed  and  Bcd(A57-61)  at  the  hb-CAT 
reporter.  We  specifically  chose  conditions  in  which  Bed 
proteins  were  expressed  at  high  levels  to  reveal  functional 
defects  of  the  mutant  Bed  that  could  not  be  overcome  by 
increased  Bed  concentrations  (also  see  Figure  IB  for  reporter 
assay  data).  As  shown  previously  (23),  our  ChIP  experiments 
detected  a  significant  occupancy  of  wt  Bed  at  the  hb  enhancer 
element  of  the  reporter  gene  [Figure  4B  (a),  lane  9].  In 
contrast,  Bcd(A57-61)  failed  to  exhibit  an  occupancy  above 
background  levels  at  the  hb  enhancer  element  in  the  same 
ChIP  assays  [Figure  4B  (a),  lane  11].  Together,  these  results 
suggest  that  Bcd(A57-61),  despite  its  normal  ability  to  bind 
DNA  in  vitro ,  has  a  functional  defect  in  accessing  the  hb 
enhancer  element  in  cells. 

dCBP  restores  the  occupancy  of  Bcd(A57-61) 
at  hb  enhancer  in  cells 

To  determine  whether  dCBP  can  affect  the  ability  of  Bcd(A57- 
61)  to  access  the  hb-CAT  reporter  gene  in  cells,  we  conducted 
ChIP  experiments  in  the  presence  of  exogenously  expressed 
dCBP  (see  Figure  4 A  for  a  schematic  diagram  of  the  reporter 
gene).  As  shown  by  the  ChIP  data  [Figure  4B  (a),  lanes  1 1  and 
12],  dCBP  restored  the  occupancy  of  Bcd(A57-61)  at  the  hb 
enhancer  element  in  cells  [Figure  4B  (a),  lane  12].  Under  the 
conditions  of  high  Bed  concentrations,  dCBP  had  little  effect 
on  wt  Bed  (lanes  9  and  10)  as  shown  previously  (23). 

Our  ChIP  experiments  also  revealed  a  restored  occupancy 
of  GTFs  at  the  hb-CAT  reporter  caused  by  high  levels  of  dCBP. 
In  the  absence  of  exogenous  dCBP,  Bcd(A57-61)  failed  to 
enhance  the  occupancy  of  either  TBP  or  TFIIB  at  the  promoter 
region  [Figure  4B  (b  and  c),  compare  lanes  7,  9  and  1 1].  In  the 
presence  of  exogenous  dCBP,  Bcd(A57-61)  increased  the 
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Figure  4.  Restored  occupancy  of  Bcd(A57-61)  and  GTFs  by  exogenous  dCBP.  (A)  Shown  is  a  schematic  diagram  of  the  reporter  gene,  marking  the  promoter  region 
(thin  line)  used  for  detection  by  PCR  in  ChIP  assays.  The  diagram  is  not  drawn  to  scale.  (B)  Shown  are  ChIP  data  from  S2  cells  that  were  transfected  with  plasmids 
expressing  the  indicated  effectors  [dCBP  proteins,  5  jig;  Bcd(A57-61),  1  jig]  and  the  hb-CAT  reporter  plasmid  (1  jig).  Antibodies  used  for  ChIP  assays  were:  HA  to 
detect  HA-tagged  Bed  (panel  a),  TBP  (panel  b),  TFIIB  (panel  c),  acetyl-H3  (panel  d)  and  H4  (panel  e).  Lanes  1-6  show  input  controls,  which  represent  the  PCR 
product  of  1%  of  the  total  isolated  DNA  used  in  the  ChIP  assays. 
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occupancy  of  both  TFIIB  and  TBP  (lane  12).  Finally,  our  ChIP 
experiments  showed  that  dCBP  increased  the  acetyl-H3  and 
H4  levels  at  the  reporter  in  the  presence  of  either  wt  Bed  or 
Bcd(A57-61)  [Figure  4B  (d  and  e),  lanes  9-12].  In  all  the 
cases,  the  effects  of  dCBP  required  the  presence  of  Bed  or 
its  derivative  (compare  lanes  7  and  8),  indicating  that  these 
observed  effects  represent  Bed-dependent  functions  of  dCBP. 
Together,  these  results  reveal  not  only  a  restored  occupancy, 
caused  by  increased  dCBP  levels  in  cells,  of  Bcd(A57-61)  at 
the  hb-CAT  reporter  but  also  an  elevated  recruitment  of  GTFs 
and  an  increased  histone  acetylation  level  at  the  reporter. 

dCBP  may  negatively  affect  Bcd-Sin3A 
interaction  in  cells 

As  further  detailed  in  Discussion  (below),  several  models  are 
consistent  with  our  finding  that  high  levels  of  dCBP  can  restore 
activity  to  Bcd(A57-61).  For  example,  it  is  possible  that  dCBP 
and  Sin3A  may  compete  for  Bed  interaction,  thus  representing 
antagonistic  forces  to  influence  Bed  function.  To  determine 
whether  the  interaction  between  Bed  and  Sin3A  might  be 
affected  by  dCBP,  we  conducted  co-IP  experiments  in  cells 
with  altered  dCBP  levels.  To  reduce  cellular  levels  of  dCBP, 
we  used  an  RNAi  approach,  which  has  been  shown  to  specif¬ 
ically  affect  Bed  activity  without  altering  the  amount  of 
Bed  in  cells  (23).  We  used  exogenously  expressed  dCBP  to 
increase  its  cellular  levels.  As  shown  in  our  co-IP  experiments 
(Figure  5A),  the  amount  of  Sin3A  precipitated  by  Bed  was 
increased  by  dCBP  RNAi  treatment  (lane  6)  and  marginally 
affected  by  dCBP  overexpression  (lane  5).  Figure  5B  shows 
the  quantitation  of  the  data  from  three  independent  experi¬ 
ments  (relative  amounts  of  co-IP  Sin3A  for  lanes  4,  5  and  6 
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Figure  5.  Interaction  between  Bed  and  Sin3A  may  be  affected  by  dCBP. 
(A)  The  interaction  between  HA-tagged  Bed  and  the  endogenous  Sin3A  in 
S2  cells  was  detected  by  a  co-IP  analysis  (see  Materials  and  Methods  for 
details).  Sin3A  co-precipitated  by  anti-HA  antibodies  was  detected  by  western 
blot  using  anti-Sin3A  antibodies.  S2  cells  were  either  transfected  (+)  or  not  (— ) 
with  the  indicated  plasmids  expressing  HA-Bcd  (1  jig)  and  dCBP  (5  jig),  and 
had  either  been  subject  (+)  or  not  (— )  to  dCBP  RNAi  treatment  (25  jig  dsRNA). 
In  this  figure,  lanes  1-3  are  controls  showing  that  no  Sin3A  was  precipitated 
in  the  absence  of  HA-Bcd.  Input  represents  one-tenth  of  total  nuclear  extract 
used  in  the  co-IP  assay  as  described  previously  (23).  (B)  The  relative  amounts 
of  the  co-IP  Sin3A  product  in  three  independent  experiments  were  quantified 
(see  Materials  and  Methods)  and  the  results  are  shown  (mean  ±  SD);  all  lanes  in 
this  graph  correspond  to  those  in  (A). 


are  100,  87  ±  16  and  276  ±  88,  respectively).  These  results 
suggest  that  dCBP  may  negatively  affect  the  interaction 
between  Bed  and  Sin3A.  The  modest  effect  of  dCBP  on 
Sin3A-Bcd  interaction  suggests  that  such  an  antagonistic 
effect  may  represent  only  one  of  the  several  individually 
weak  mechanisms  by  which  dCBP  rescues  the  activity  of 
Bcd(A57-61)  (see  Discussion  for  further  details). 


The  HAT-deficient  mutant  dCBP  can  partially  rescue 
Bcd(A57-61)  activity 

Our  previous  experiments  have  shown  that  dCBP  can  increase 
the  activity  of  wt  Bed  through  both  HAT-dependent  and 
-independent  mechanisms  (23).  A  HAT-independent  action  of 
dCBP  suggests  a  structural  role  of  this  protein  in  regulating 
Bed  activity,  a  suggestion  consistent  with  the  observed  neg¬ 
ative  effect  of  dCBP  on  Bcd-Sin3A  interaction  (Figure  5).  To 
specifically  determine  whether  the  HAT  activity  of  dCBP  is 
required  for  its  ability  to  restore  function  to  Bcd(A57-61)  on 
the  hb-CAT  reporter,  we  used  a  HAT-deficient  mutant  of  dCBP 
(28);  both  wt  dCBP  and  this  mutant  protein  are  accumulated  to 
similar  levels  when  expressed  in  S2  cells  (23).  As  shown  in 
Figure  6,  this  mutant  dCBP  increased  partially  the  activity  of 
wt  Bed  on  the  hb-CAT  reporter  [lanes  4-6;  also  see  (23)].  It 
also  rescued,  though  with  a  reduced  efficiency,  the  activity  of 
Bcd(A57-61)  on  the  hb-CAT  reporter  (lanes  7-9).  Together, 
these  results  suggest  that  dCBP  can  play  an  enzyme  activity- 
independent  role  in  rescuing  the  activity  of  Bcd(A57-61)  on 
the  hb-CAT  reporter  in  cells. 


250 


200 


150 


100 


50 


0 


Fold  increase  .  \2  1.1  28  12  -  76  29 

dCBP  (5  pg)  .  wt  Mut  -  Wt  Mut  -  Wt  Mut 


Bed  (1  pg)  -  Wt  Bed  Bcd(A57-61) 

Figure  6.  HAT-deficient  dCBP  can  partially  rescue  the  activity  of  Bcd(A57- 
61).  Shown  are  CAT  assay  results  in  S2  cells  that  were  transfected  with  the 
reporter  plasmid  hb-CAT  (1  jig),  the  effector  plasmids  (1  jig)  expressing  the 
indicated  Bed  proteins,  with  (+)  or  without  (— )  another  effector  plasmid 
(5  jig)  expressing  dCBP.  The  increase  of  Bed  activity  by  wt  and  mutant  dCBP 
proteins  is  also  indicated  in  the  figure  as  fold  increase.  The  exogenously 
expressed  wt  and  mutant  dCBP  proteins  are  accumulated  at  similar  levels 
in  S2  cells  as  described  previously  (23). 
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DISCUSSION 

As  a  molecular  morphogen,  Bed  can  undergo  switches,  in 
a  concentration-dependent  manner,  between  its  active  and 
inactive  states  in  activating  transcription  of  its  target  genes. 
The  experiments  described  in  this  report  suggest  another 
mechanism  that  can  facilitate  on-off  switches  of  Bed  activity 
in  a  Bed  concentration-independent  manner.  In  particular, 
the  mutant  Bcd(A57-61)  is  incapable  of  activating  the 
hb-CAT  reporter  gene  in  S2  cells  at  all  concentrations  tested 
(Figure  IB).  The  inability  of  this  mutant  Bed  to  activate  the 
hb-CAT  reporter  reflects  a  distinct  functional  state  of  this 
protein  rather  than  its  defects  in  protein  stability.  In  fact, 
this  same  mutant  protein  is  only  modestly  weaker  than  the 
wt  protein  on  another  reporter  gene,  kni-CAT ,  which  contains 
the  Bed-responsive  kni  enhancer  element  (17).  These  and 
other  results  suggested  that  the  A57-61  mutation  may  cause 
its  functionally  inactive  state  on  hb-CAT  by  more  efficiently 
interacting  with  a  co-repressor  protein(s),  such  as  Sin3A  and 
its  associated  complex(es)  (24,25).  The  experiments  described 
in  this  report  show  that  increased  concentrations  of  dCBP 
can  restore  activity  to  Bcd(A57-61)  on  the  hb-CAT  reporter 
in  cells.  These  results  suggest  that  the  opposing  actions  of 
positive  and  negative  co-factors  can  facilitate  Bed  to  switch 
between  its  active  and  inactive  states  in  a  manner  that  is  Bed 
concentration-independent. 

Although  Bcd(A57-61)  can  bind  to  both  a  single  site  and 
natural  enhancer  elements  in  vitro ,  it  is  unable  to  access  the 
hb  enhancer  element  in  cells  (Figures  3  and  4).  These  results 
suggest  that  the  DNA  binding  defect  of  this  mutant  protein 
is  only  manifested  in  a  cellular  context.  This  notion  is  con¬ 
sistent  with  our  finding  that  the  PAH  domains  of  Sin3A  do 
not  exhibit  any  increased  ability  to  reduce  DNA  binding  by 
Bcd(A57-61)  in  vitro  when  compared  with  wt  Bed  (data  not 
shown)  (25).  We  propose  that  other  co-repressors  or  those  that 
are  associated  with  Sin3A,  such  as  the  HDACs,  can  reduce  the 
ability  of  Bed  to  access  a  natural  enhancer  in  cells.  It  is  pos¬ 
sible  that  the  enzymatic  HD  AC  activity  that  is  more  stably 
associated  with  Bcd(A57-61)  makes  it  unable  to  negotiate 
with  histones  for  accessing  DNA.  It  is  also  possible  that  a 
more  stable  Bcd-co-repressor  complex  may  sterically  hinder 
the  interaction  between  Bcd(A57-61)  molecules  and  prevent 
cooperative  binding  to  the  enhancer  element  in  cells. 

The  most  striking  finding  of  this  report  is  that  high  levels 
of  dCBP  can  switch  Bcd(A57-61)  from  its  inactive  state 
to  an  active  one  on  the  hb-CAT  reporter  in  cells.  Our  ChIP 
data  further  show  that  dCBP  increases  both  the  ability  of 
Bcd(A57-61)  to  access  the  hb  enhancer  element  in  cells  and 
the  occupancy  of  GTFs  at  the  reporter  promoter  (Figure  4B). 
How  does  dCBP  switch  the  activity  states  of  Bcd(A57-61)  on 
hb-CAT  in  cells?  Since  Bed  and  dCBP  can  physically  interact 
with  each  other  through  multiple  domains  (23)  (Figure  1A), 
it  is  possible  that  dCBP  may  increase  the  DNA  binding  ability 
of  Bed  in  cells  by  stabilizing  the  interaction  between  Bed 
molecules  and  thus  enhancing  its  cooperativity.  It  is  also  pos¬ 
sible  that  dCBP  may  physically  compete  with  co-repressor 
complexes  in  interacting  with  Bed.  Our  co-IP  results  suggest 
that  dCBP  may  negatively  affect  the  interaction  between  Bed 
and  Sin3A  in  cells  (Figure  5).  dCBP  could  also  play  a  role  in 
facilitating  the  interaction  between  Bed  and  the  transcription 
machinery.  For  all  these  actions,  dCBP  may  play  a  structural 


(rather  than  enzymatic)  role  (Figure  6).  Finally,  the  fact  that 
the  HAT-defective  mutant  of  dCBP  does  have  a  reduced  abil¬ 
ity  to  restore  activity  to  Bcd(A57-61)  (Figure  6)  indicates 
that  its  enzymatic  activity  has  a  positive  role,  possibly  through 
modifications  of  histones.  It  is  likely  that  dCBP  can  affect  the 
Bcd(A57-61)  activity  through  multiple  mechanisms  that  may 
be  weak  individually  (Figures  5  and  6)  but,  when  combined, 
can  lead  to  a  dramatic  switch  from  its  inactive  state  to  an  active 
one  on  the  hb-CAT  reporter  in  cells. 

Currently,  it  is  poorly  understood  how  precisely  Bed  activ¬ 
ates  transcription.  Previous  studies  suggest  that  much  of  its 
activation  function  is  conferred  by  the  C-terminal  portion  of 
Bed  (16,29).  This  portion  of  the  protein  contains  several 
domains,  including  the  acidic,  glutamine-rich  and  alanine- 
rich  domains,  that  are  characteristic  of  activation  domains 
capable  of  interacting  with  components  of  the  transcription 
machinery  (16,29-31).  Interestingly,  the  alanine-rich  domain 
previously  thought  to  play  an  activation  role  was  shown 
recently  to  exhibit  an  inhibitory  function  instead  (32).  The 
C-terminal  domain  of  Bed  can  also  interact  with  dCBP  (23), 
and  our  results  show  that  this  domain  is  responsible  for  medi¬ 
ating  the  activity- switching  function  of  dCBP  (Figure  2). 
Although  much  of  the  activation  function  of  Bed  is  provided 
by  its  C-terminal  domain,  the  N-terminal  portion  of  the  pro¬ 
tein  also  contains  some  activation  function.  Studies  have 
shown  that  Bcd(  1-246),  a  derivative  lacking  the  entire  C- 
terminal  portion  of  Bed,  can  rescue  the  bcd~  phenotype 
when  expressed  at  high  levels  (33).  These  results  suggest 
that  Bed  can  achieve  its  activation  function  through  multiple 
domains  presumably  by  interacting  with  different  proteins, 
including  co-activators  and  components  of  the  transcription 
machinery.  The  results  described  in  this  report  further  support 
the  importance  of  dCBP  in  facilitating  activation  by  Bed. 

Bed  is  a  morphogenetic  protein  whose  behavior  can  be 
regulated  not  only  by  its  own  concentration  but  also  by  the 
enhancer  architecture  (17).  Our  recent  experiments  show  that, 
on  the  kni  and  hb  enhancer  elements,  the  N-terminal  domain  of 
Bed  is  preferentially  used  for  either  cooperative  DNA  binding 
or  self-inhibition,  respectively  (17).  We  propose  that  the  inter¬ 
action  between  Bed  molecules  bound  to  the  kni  enhancer 
element,  through  its  N-terminal  domain,  can  interfere  with 
its  interaction  with  co-repressors,  such  as  Sin3A.  As  described 
in  this  report,  co-activators  such  as  dCBP  and  co-repressors 
such  as  Sin3A  can  also  functionally  antagonize  each  other, 
possibly  by  competing  for  Bed  interaction  as  part  of  the  mech¬ 
anisms  (Figure  5).  Bed  is  more  sensitive  to  the  self-inhibitory 
function  on  the  hb  enhancer  element  than  on  the  kni  enhancer 
element  (17);  consistent  with  dCBP’s  antagonistic  role,  dCBP 
increases  the  activity  of  Bed  more  robustly  on  the  hb  enhancer 
element  than  on  the  kni  enhancer  element  (23).  However,  the 
interplay  between  positive  and  negative  activities  that  regu¬ 
late  Bed  functions  is  probably  far  more  complex  than  the 
simple  physical  competition:  as  already  discussed  above, 
dCBP  can  affect  Bed  activity  through  multiple  mechanisms 
in  both  HAT-dependent  and  independent  manners  (Figure  6) 
(23).  Moreover,  in  the  presence  of  exogenous  dCBP,  high 
levels  of  Bcd(A57-61)  cause  a  reduction  in  its  activity  on 
the  hb-CAT  reporter  in  cells  (Figure  IB),  a  reduction  that  is 
not  observed  with  wt  Bed  (23),  suggesting  that  the  optimal 
concentration  ratio  between  Bed  and  dCBP  may  vary  depend¬ 
ing  on  the  strengths  of  the  self-inhibitory  function  and 
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interaction  with  co-repressors.  In  addition,  high  concentrations 
of  dCBP  can  rescue  the  inactive  derivative  Bcd(A57-61),  but 
not  another  inactive  derivative  lacking  the  C-terminal  por¬ 
tion,  Bcd(l-246;  A57-61),  suggesting  that  the  Bcd-dCBP 
interaction  strength  can  also  influence  the  balance  between 
positive  and  negative  activities  that  regulate  Bed  function. 

The  experiments  described  in  this  report  suggest  that  an 
activator’s  function  is  subject  to  intricate  controls  by  both 
positive  and  negative  activities  in  cells.  A  fine  balance 
between  these  activities  is  critical  for  normal  cellular  and 
developmental  processes.  Our  transgenic  experiments  show 
that  both  Bcd(A57-61),  which  has  a  strengthened  self- 
inhibitory  function,  and  Bcd(A52-56),  which  has  a  weakened 
self-inhibitory  function,  cause  embryonic  defects  [(24)  and 
unpublished  data].  In  addition,  embryos  with  reduced  dCBP 
activity  exhibit  defects  in  early  expression  patterns  of  a 
Bed  target  gene,  even-skipped  [(23)  and  Y.  Wen,  A.  York 
and  J.  Ma,  unpublished  data].  Finally,  a  recent  study  reveals 
that  mutations  affecting  SAP  18,  a  component  of  the  Sin3A- 
HDAC  complex,  can  alter  Bed  function  and  anterior  patterning 
in  embryos  (34).  In  addition  to  the  co-factors  discussed  here 
(Sin3A,  dCBP  and  SAP18),  Bed  likely  has  the  ability  to  inter¬ 
act  with  many  other  proteins,  including  not  only  regulatory 
proteins  but  also  components  of  the  transcription  machinery 
(30,31).  Precisely  how  all  these  different  proteins  harmoni¬ 
ously  regulate  and  facilitate  the  execution  of  Bed  functions 
during  development  remains  to  be  determined.  Recent  studies 
have  shown  that  the  Bed  gradient  in  embryos  possesses  a 
strikingly  sophisticated  ability  to  activate  its  target  genes  in 
a  precise  manner  (35-37).  These  findings  further  underscore 
the  need  of  intricate  control  mechanisms  that  facilitate  Bed  to 
switch  between  its  active  and  inactive  states  in  target  gene 
activation.  Our  studies  suggest  that  on-off  switches  of 
Bed  activity  can  be  achieved  not  only  in  a  Bed  concentration- 
dependent  manner  but  also  in  a  Bed  concentration- 
independent  manner.  It  remains  to  be  investigated  whether 
and  how  Bed  interacting  proteins,  including  those  yet  to  be 
identified,  participate  in  the  precision  control  of  target  gene 
activation  during  development. 
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abstract:  We  have  determined  the  solution  structure  of  a  complex  containing  the  K50  class  homeodomain 
Pituitary  homeobox  protein  2  (PITX2)  bound  to  its  consensus  DNA  site  (TAATCC).  Previous  studies 
have  suggested  that  residue  50  is  an  important  determinant  of  differential  DNA-binding  specificity  among 
homeodomains.  Although  structures  of  several  homeodomain— DNA  complexes  have  been  determined, 
this  is  the  first  structure  of  a  native  K50  class  homeodomain.  The  only  K50  homeodomain  structure 
determined  previously  is  an  X-ray  crystal  structure  of  an  altered  specificity  mutant,  Engrailed  Q50K 
(EnQ50K).  Analysis  of  the  NMR  structure  of  the  PITX2  homeodomain  indicates  that  the  lysine  at  position 
50  makes  contacts  with  two  guanines  on  the  antisense  strand  of  the  DNA,  adjacent  to  the  TAAT  core 
DNA  sequence,  consistent  with  the  structure  of  EnQ50K.  Our  evidence  suggests  that  this  side  chain  may 
make  fluctuating  interactions  with  the  DNA,  which  is  complementary  to  the  crystal  data  for  EnQ50K. 
There  are  differences  in  the  tertiary  structure  between  the  native  K50  structure  and  that  of  EnQ50K, 
which  may  explain  differences  in  affinity  and  specificity  between  these  proteins.  Mutations  in  the  human 
PITX2  gene  are  responsible  for  Rieger  syndrome,  an  autosomal  dominant  disorder.  Analysis  of  the  residues 
mutated  in  Rieger  syndrome  indicates  that  many  of  these  residues  are  involved  in  DNA  binding,  while 
others  are  involved  in  formation  of  the  hydrophobic  core  of  the  protein.  Overall,  the  role  of  K50  in 
homeodomain  recognition  is  further  clarified,  and  the  results  indicate  that  native  K50  homeodomains 
may  exhibit  differences  from  altered  specificity  mutants. 


The  homeodomain  is  an  evolutionarily  conserved  protein 
domain  found  in  organisms  ranging  from  yeast  and  Droso¬ 
phila  to  humans  (7—5).  Homeodomain-containing  proteins 
are  known  to  play  important  roles  in  such  diverse  activities 
as  embryonic  pattern  formation,  cell-type  specification,  and 
differentiation  (5).  This  domain  is  responsible  for  recognizing 
specific  DNA  sequences,  thereby  recruiting  the  correspond¬ 
ing  transcription  factors  to  specific  target  genes.  Most  DNA 
sites  that  homeodomains  recognize  consist  of  only  6  base 
pairs,  with  a  common  TAAT  core  sequence  followed  by  two 
base  pairs  that  have  been  proposed  to  define  the  specificity 
(. 3 ,  4).  The  homeodomain  consists  of  a  self-folding,  stable 
protein  domain  of  60  amino  acids.  Previous  studies  of 
homeodomains  have  shown  that  they  have  a  compact  3 -helix 
stmcture  and  a  flexible  N-terminal  arm  (3,  5).  The  third  helix, 
called  the  recognition  helix,  makes  specific  contacts  within 
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the  major  groove  of  the  DNA.  Homeodomains  have  been 
studied  extensively  (7—5),  including  genetic,  biochemical, 
and  structural  analyses,  due  to  their  critical  role  in  cellular 
processes  and  to  the  fact  that  they  serve  as  a  valuable  model 
for  probing  the  physical  basis  of  protein— DNA  interactions. 
Although  the  overall  topology  of  the  homeodomain  motif 
and  its  docking  arrangement  on  duplex  DNA  are  now 
generally  well-defined,  fundamental  questions  remain,  par¬ 
ticularly  in  regard  to  the  role  of  amino  acid  side  chains  in 
defining  the  specificity  of  homeodomain— DNA  binding  and 
the  nature  of  the  interactions  of  these  side  chains  with  the 
DNA  binding  sites.  Homeodomains  have  evolved  different 
DNA  specificities  in  part  by  altering  the  amino  acid  residue 
at  position  50,  which  can  interact  with  base  pairs  5  and  6, 
and  to  a  lesser  extent,  base  pair  4,  in  the  TAATNN  consensus 
binding  site.  A  previous  study  has  shown  that  each  of  6 
different  amino  acids  tested  at  position  50  confers  a  different 
DNA  binding  specificity  (<5).  Tucker- Kellogg  et  al.  (7)  and 
others  have  emphasized  the  point  that  the  degree  of  specific¬ 
ity  of  a  homeodomain  for  its  particular  DNA  binding  sites 
depends  on  the  identity  of  the  amino  acid  residue  in  position 
50.  Most  homeodomains  contain  a  glutamine  residue  at  this 
position,  and  are  therefore  referred  to  as  Q50  homeodomains. 
Q50  homeodomains  prefer  DNA  sequences  such  as  TAAT- 
TA  and  TAATGG.  The  homeodomain  of  Bicoid,  which  is  a 
Drosophila  morphogenetic  protein,  contains  a  lysine  at 
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position  50  and  is  the  founding  member  of  the  K50  class  of 
homeodomains  ( 8 ).  The  K50  class  of  homeodomains  rec¬ 
ognizes  a  consensus  DNA  sequence  of  TAATCC.  Much 
attention  has  been  focused  on  the  consequences  of  lysine 
being  located  at  position  50,  largely  due  to  the  fact  that  the 
most  dramatic  examples  of  altered  DNA  specificity  occur 
when  a  lysine  is  either  introduced  or  replaced  at  position 
50.  For  example,  when  Q50  in  Engrailed  is  mutated  to  an 
alanine,  the  Q50A  mutant  has  an  affinity  and  specificity  that 
are  very  similar  to  those  of  the  wild-type  protein,  but  when 
mutated  to  a  lysine,  the  specificity  changes  from  TAATTA 
to  TAATCC,  clearly  demonstrating  the  important  role  played 
by  the  residue  in  position  50,  especially  in  the  case  of  K50, 
in  defining  the  specificity  of  DNA  binding  (9—11).  Percival- 
Smith  et  al.  (12)  investigated  wild-type  and  Q50K  mutant 
Fushi  tarazu  homeodomains  in  conjunction  with  altering  the 
base  pairs  at  positions  5  and  6  in  the  binding  site,  and  found 
that  differences  in  XD  of  ~  100-fold  are  observed  when  the 
binding  site  is  not  the  optimal  one.  In  addition  to  position 
50,  position  47  has  a  role  in  defining  specificity  for  some 
homeodomains  in  correlation  with  base  pair  4  of  the  binding 
site,  especially  when  the  residue  is  phenylalanine  or  arginine 
(73,  14). 

Although  structures  of  several  homeodomains  and 
homeodomain— DNA  complexes  have  been  determined  by 
X-ray  crystallography  or  NMR  spectroscopy  (75),  including 
representatives  of  the  wild-type  Q50,  S50,  C50,  G50,  and 
150  classes  of  homeodomains  (16—20),  the  only  experimen¬ 
tally  determined  K50  homeodomain  structure  available  is  an 
X-ray  crystal  structure  of  an  altered  specificity  mutant, 
Engrailed  Q50K  (EnQSOK1),  bound  to  the  TAATCC  site  (7). 
The  latter  study  found  that  the  side  chain  of  K50  projects 
into  the  major  groove  of  the  DNA  and  makes  hydrogen  bond 
contacts  with  the  06  and  N7  atoms  of  the  guanines  at  base 
pairs  5  and  6  of  the  complementary  strand  of  the  TAATCC 
binding  site.  This  is  the  only  case  in  which  direct  hydrogen 
bond  contacts  have  been  reported  for  amino  acid  residue  50 
in  any  homeodomain— DNA  complex  structure.  Unfortu¬ 
nately,  the  relevance  of  the  EnQ50K  studies,  or  analyses  of 
other  mutants  such  as  Paired  S50K  (6)  and  Fushi  tarazu 
Q50K  (27),  to  the  case  of  native  K50  homeodomains  is 
unclear  in  the  absence  of  experimental  structural  data  for  a 
native  K50  homeodomain.  For  example,  the  identity  of  the 
amino  acid  residue  at  position  54  seems  to  be  constrained 
by  the  residue  at  position  50.  A  glutamine  at  position  50 
allows  for  many  different  residues  to  be  present  at  position 
54,  with  Met  being  the  most  abundant.  However,  Met54  is 
never  found  when  position  50  is  lysine  (22).  Determining 
the  biological  relevance  of  studies  of  single  site  mutants 
should  take  into  account  possible  covariation  of  residues  (23). 
For  example,  structural  studies  of  an  EnQ50A  mutant  have 
been  conducted  (77)  in  order  to  provide  additional  informa¬ 
tion  concerning  the  role  of  residue  50  in  general,  and  Q50 
in  particular;  however,  a  phage  display  selection  of  Engrailed 


1  Abbreviations:  PITX2,  pituitary  homeobox  protein  2;  EnQ50K, 
Engrailed  Q50K  mutant;  IPTG,  isopropyl-/?-D-thiogalactopyranoside; 
DSS,  2,2-dimethyl-2-silapentane-5-sulfonic  acid;  PBS,  phosphate  buffer 
solution;  TCB,  thrombin  cleavage  buffer;  DTT,  dithiothreitol;  NOESY, 
nuclear  Overhauser  enhancement  spectroscopy;  HSQC,  heteronuclear 
single  quantum  correlation;  NOE,  nuclear  Overhauser  effect;  rmsd,  root- 
mean-square  deviation;  bb,  backbone;  PDB,  Protein  Data  Bank;  HD, 
homeodomain. 


mutants  failed  to  recover  a  Q50A  mutant:  the  only  Q50A 
mutant  recovered  also  contained  a  I47T  mutation  (24). 
Another  issue  concerning  the  Engrailed  Q50K  mutant  is  the 
observation  that  it  binds  to  the  consensus  TAATCC  site  with 
an  unusually  high  affinity,  which  approaches  the  picomolar 
range  (9).  There  is  no  evidence  that  natural  K50  class 
homeodomains  have  such  a  high  affinity  for  DNA  (25,  26). 
The  full-length  PITX2  protein  has  a  XD  of  50  nM  (25).  A 
XD  was  determined  for  the  Q50K  mutant  of  the  Fushi  tarazu 
homeodomain,  and  this  value  was  found  to  be  0.63  nM  (72), 
which  is  a  much  lower  affinity  than  the  EnQ50K  mutant. 
The  XD  for  the  PITX2  homeodomain  alone  was  found  to  be 
2.6  zb  0.38  nM  (see  Supporting  Information),  which  is 
comparable  to  the  Fushi  tarazu  mutant,  and  also  a  much 
lower  affinity  than  the  Engrailed  Q50K  mutant.  Moreover, 
the  X-ray  structure  of  EnQ50K  reveals  two  conformations 
with  the  side  chain  of  K50  contacting  either  the  5th  or  6th 
position  on  the  antisense  strand  of  the  DNA.  Whether  or 
not  natural  K50  homeodomains  exhibit  these  two  conforma¬ 
tions  in  a  static  state  and  whether  or  not  the  side  chain  of 
K50  exhibits  a  fluctuating  state  were  unknown.  Another 
biochemical  property  shared  by  the  wild-type  K50  class 
homeodomains  Bicoid  and  PITX2  is  that  they  both  can 
recognize  naturally  occurring  DNA  sites  that  deviate  from 
the  consensus  site  TAATCC  (2,  27,  27—29).  In  contrast,  a 
Q50K  mutant  of  the  Fushi  tarazu  homeodomain  is  unable 
to  bind  any  nonconsensus  DNA  binding  sites  tested  (27). 
Together,  these  considerations  underscore  the  importance  of 
obtaining  solution  structures  of  native  K50  class  homeo¬ 
domains. 

The  question  regarding  side-chain  conformational  hetero¬ 
geneity,  referred  to  above  in  the  context  of  the  observations 
concerning  the  K50  side  chain  in  the  EnQ50K  crystal 
structure,  is  broader  in  scope  and  of  fundamental  importance 
for  understanding  the  full  range  of  interactions  that  can  occur 
at  a  protein— DNA  interface.  Crystallographic  studies  have 
generally  indicated  that  there  are  several  conserved  and 
relatively  stable  contacts  at  the  homeodomain— DNA  inter¬ 
face.  In  several  instances,  such  as  the  aforementioned  case 
of  K50  in  the  EnQ50K  structure  and  the  case  of  Gln50  in 
the  crystal  structure  of  an  Even- skipped  homeodomain 
complex  (30),  multiple,  significantly  populated  conformations 
are  observed  for  the  side  chain,  while  the  nearly  invariant 
asparagine  in  position  51  is  observed  to  make  very  stable 
contacts  with  the  N6  and  N7  atoms  of  the  adenine  base  in 
position  3  of  the  consensus  TAAT  core  binding  site.  On  the 
other  hand,  NMR  studies  (31,  32)  and  molecular  dynamics 
simulations  (31,  33)  have  provided  strong  indications  of  a 
dynamic,  fluctuating  environment  encompassing  some  of  the 
key  amino  acid  side  chains  at  the  interface,  most  importantly, 
the  side  chains  of  asparagine  51  and  of  the  position  50 
residue.  Billeter  and  co-workers  (37)  proposed  that,  at  least 
in  the  case  of  Antennapedia,  the  homeodomain  achieves 
specificity  through  a  fluctuating  network  of  short-lived 
contacts  that  allow  it  to  recognize  DNA  without  the  entropic 
cost  that  would  result  if  side  chains  were  immobilized  upon 
DNA  binding.  Significant  interest  has  been  expressed  in  the 
literature  (7,  31,  33,  34)  for  obtaining  experimental  data  on 
native  K50  homeodomains  in  order  to  shed  further  light  on 
these  fundamental  issues. 

For  the  structure  determination  studies  reported  herein,  we 
chose  the  well-characterized  homeodomain  PITX2  as  our 


Solution  Structure  of  the  PITX2  Homeodomain 


Biochemistry ,  Vol.  44,  No.  20,  2005  7499 


A,  1  11  21 

Human  PITX2  HD  GS  QRRQRTHFTS  QQLQQLEATF  QRNR 

31  41 

YPDMST  REEtAVWTNL  TEARVRVWFK 

51  60 

ISIRRAKWRKRE  EFIVTD 
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Figure  1:  (a)  Amino  acid  sequence  of  the  PITX2  homeodomain  used  for  the  structural  studies.  The  extra  residues  at  the  N-  and  C-termini 
are  part  of  the  expression  system,  (b)  DNA  sequence  of  the  binding  site  used  in  the  structural  studies.  The  DNA  sequence  consists  of  the 
TAATCC  binding  site  surrounded  by  residues  to  confer  stability  to  the  double  strand. 


representative  model  for  the  K50  class  of  homeodomains. 
PITX2  is  a  transcription  factor  that  is  found  in  many 
developing  tissues  in  vertebrate  embryos.  It  has  been  shown 
to  be  expressed  in  the  brain,  heart,  pituitary,  mandibular  and 
maxillary  regions,  eye,  gut,  limbs,  and  umbilicus  (35—38). 
The  homeodomain  is  just  one  domain  of  the  full-length 
PITX2  protein.  There  have  been  three  major  isoforms  of 
PITX2  identified,  and  these  isoforms  are  produced  by 
alternative  splicing  and  the  use  of  different  promoters  (35, 
36,  39—41).  All  of  the  isoforms  contain  different  N-terminal 
domains,  while  the  homeodomain  and  C-terminal  domains 
are  identical.  The  C-terminal  region  contains  a  transcriptional 
activation  domain.  Mice  that  have  been  genetically  engi¬ 
neered  to  be  homozygous  for  a  pitx2  null  allele  have  a  single 
atrium,  arrested  development  of  the  pituitary  gland,  numerous 
defects  of  the  eye,  and  altered  development  of  the  mandibular 
and  maxillary  regions  (41—43).  Mutations  in  PITX2  are 
known  to  be  a  cause  of  Rieger  syndrome  (44—48),  a  genetic 
disease  characterized  by  defects  in  the  eye,  facial  features, 
and  umbilicus.  Sequencing  of  DNA  from  human  patients  has 
shown  that  many  mutations  in  PITX2  result  in  single  amino 
acid  substitutions  within  the  homeodomain  region  (45—48). 
Although  modeling  and  biochemical  studies  have  provided 
useful  insights  into  the  effect  of  these  mutations  (refer  to 
Table  2),  no  structure-based  information  is  available  on  how 
these  mutations  might  affect  the  PITX2  homeodomain 
properties. 

In  the  present  study,  the  NMR  solution  structure  of  the 
PITX2  homeodomain  bound  to  its  consensus  DNA  binding 
site  is  reported.  This  represents  the  first  experimentally 
determined  structure  of  a  native  K50  class  homeodomain. 
The  results  reveal  a  tertiary  structure  similar  to  other 
homeodomains,  with  K50  making  contacts  with  the  two 
guanines  adjacent  to  the  TAAT  core  DNA  sequence. 
Evidence  indicates  that  K50  may  interact  with  DNA  in  a 
flexible  manner.  The  tertiary  structure  of  PITX2  indicates 
that  the  first  two  helices  are  slightly  closer  to  each  other 
than  in  the  EnQ50K  mutant,  and  the  third  helix  has  a  slightly 
different  position  relative  to  the  other  helices.  We  discuss 
how  these  structural  properties  of  PITX2  might  affect  the 
biochemical  functions  of  the  PITX2  homeodomain.  On  the 


basis  of  the  solution  structure,  we  also  discuss  the  effect  of 
mutations  found  in  Rieger  syndrome  patients  on  PITX2 
homeodomain  functions. 

MATERIALS  AND  METHODS 

Expression  of  the  PITX2  Homeodomain.  The  expression 
plasmid  pGEX-l2t-PITX2HD  consists  of  a  glutathione 
S -transferase  tag  and  a  thrombin  cleavage  site  prior  to  the 
PITX2  homeodomain  sequence.  There  are  two  extra  residues 
at  the  N-terminus  as  a  result  of  thrombin  cleavage,  and  six 
extra  residues  at  the  C-terminus  that  are  part  of  the  expression 
system.  The  final  protein  sequence  is  shown  in  Figure  1. 

The  PITX2  homeodomain  was  obtained  by  growing 
Escherichia  coli  strain  BL21-Star  (Invitrogen)  transformed 
with  pGEX-l2t-PITX2HD  in  minimal  medium  [0.85  g/L 
NaOH,  10.5  g/L  K2HP04,  12  g/L  Na2HP04,  6  g/L  KH2P04, 
1  g/L  NaCl,  6  mg/L  CaCl2,  13.2  mL/L  concentrated  (12.2 
N)  HC1,  nucleotides  (0.5  g/L  adenine,  0.65  g/L  guanosine, 
0.2  g/L  thymine,  0.5  g/L  uracil,  0.2  g/L  cytosine),  vitamins 
(1  mg/L  choline  chloride,  1  mg/L  pyridoxal  phosphate,  100 
fig/ L  riboflavin,  50  mg/L  thiamine,  50  mg/L  niacin,  1  mg/L 
biotin),  and  trace  elements  (107  jugfL  MgCl2#6H20,  20  fig/ L 
FeCl2-4H20,  0.7  jugfL  CaCl2-2H20,  0.26  fig/ L  H3B03,  0.16 
figfL  MnCl2*4H20,  16  ng/L  CuC12*2H20,  2.4  figfL  Na2- 
Mo04-2H20,  10  pM  FeCl3,  135  mM  CaCl2,  50  juM  ZnS04)] 
containing  150  mg/L  ampicillin,  4  g/L  glucose,  and  1  g/L 
NH4C1.  Half  a  liter  of  bacterial  culture  was  grown  in  baffled 
flasks  in  an  incubator  shaker  at  37  °C  until  saturation  (A6oo 
~  5.0).  This  culture  was  spun  down  (2000g,  10  min)  and 
resuspended  in  1  L  of  minimal  medium  enriched  with  10% 
15N13C-Isogro,  13C-glucose,  and  15N-ammonium  chloride 
(Sigma- Aldrich).  Expression  was  then  induced  by  addition 
of  IPTG  to  a  final  concentration  of  0.1  mM,  followed  by 
growth  at  20  °C  for  approximately  24  h. 

Purification  of  the  PITX2  Homeodomain.  Cells  were  lysed 
with  lysozyme  and  sonication.  Cleared  lysate  was  applied 
to  a  glutathione  sepharose  column  and  washed  with  PBS 
buffer,  followed  by  thrombin  cleavage  buffer  (50  mM  trizma 
hydrochloride,  150  mM  NaCl,  2.5  mM  CaCl2,  pH  8.0).  The 
resin  was  resuspended  in  thrombin  cleavage  buffer  (TCB) 
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and  transferred  to  a  50  mL  conical  tube.  The  homeodomain 
was  cleaved  from  the  glutathione  S -transferase  fusion  tag 
using  1  mg  of  thrombin  for  3  h  at  4  °C.  Nearly  complete 
cleavage  was  obtained  during  this  time  as  measured  by 
SDS— PAGE.  The  cleaved  protein  was  then  eluted  from  the 
resin  using  5  bed  volumes  of  TCB.  It  was  loaded  onto  a  SP 
sepharose  fast  flow  column  (2  mL  bed  volume,  Amersham), 
washed  with  washing  buffer  (10  mM  NaH2P04,  250  mM 
NaCl,  pH  7.0),  and  eluted  with  buffer  containing  a  higher 
salt  concentration  (10  mM  NaH2P04,  1  M  NaCl,  pH  7.0). 
Fractions  containing  the  homeodomain  were  identified  by 
Abs27g  (extinction  coefficient  =  18350  cnU1  M-1),  pooled, 
and  dialyzed  overnight  at  4  °C  in  10  mM  NaH2P04,  pH  7.0. 
Protein  yields  were  ~4.5  mg/L  of  cell  growth.  The  consensus 
DNA  duplex  (IdtDNA)  (see  Figure  1)  was  added  to  give  a 
1:1  protein:DNA  ratio,  and  the  complex  was  concentrated 
by  burying  the  dialysis  bag  in  Spectra/Gel  Absorbent 
(Spectrum),  or  Aquacide  (Calbiochem).  Samples  were  dia¬ 
lyzed  in  10  mM  NaH2P04,  pH  7.0  after  concentration. 
Complete  protease  inhibitors  (Roche,  1  tablet  in  3  mL,  add 
1  /uh),  leupeptin  (0.3  mM),  DTT  (2  mM),  and  Pefabloc  (0.2 
mM,  Roche)  were  all  added  to  inhibit  proteases.  Sodium 
azide  (6  mM  stock,  add  1  juh  to  540  juL  NMR  sample)  was 
added  to  prevent  bacterial  growth  in  the  sample.  The  final 
sample  concentration  was  approximately  1  mM,  in  90%  10 
mM  NaH2P04,  pH  7.0  and  10%  D20. 

Determination  of  KD.  Measurements  of  K&  were  taken  by 
following  the  procedure  of  Dave  et  al.  (2).  The  DNA  probe 
concentrations  used  in  this  analysis  were  1,  2,  4,  8,16,  20, 
and  40  nM.  Quantitative  gel  shift  assays  were  performed  by 
measuring  the  bound  and  free  fractions  of  the  probes  with  a 
Phosphorlmager  as  previously  described  (27).  The  data  were 
analyzed  using  Microsoft  Excel  (linear  regression  analysis) 
to  determine  the  XD  value  (— 1  IKq  =  slope  of  the  plot  of 
bound/free  against  bound  DNA).  These  results  can  be  seen 
in  the  Supporting  Information. 

NMR  Structure  Determination.  All  NMR  experiments  were 
carried  out  on  Varian  Inova  600  and  800  MHz  spectrometers. 
The  sample  temperature  was  set  to  295  K.  Spectra  were 
referenced  to  an  external  DSS  standard. 

Protein  Assignments.  Protein  *H,  13C,  and  15N  resonance 
assignments  were  obtained  primarily  from  heteronuclear- 
edited  NMR  spectra,  using  conventional  triple  resonance  !H- 
{13C,15N}  NMR  probes.  The  pulse  programming  codes  were 
written  in-house.  Approximately  92%  of  assignable  atoms 
were  assigned.  Sequence-specific  assignment  of  the  backbone 
HN,  N,  C',  Ca,  and  resonances  were  obtained  from  3D 
HNCO,  HN(CO)CA,  HNCA,  CBCA(CO)NH,  and  HNCACB 
(49—55)  spectra.  Assignment  of  the  aliphatic  side-chain 
resonances  was  accomplished  using  a  combination  of  3D 
15N-edited-TOCSY-HSQC  (56,  57),  H(CCO)NH-TOCSY, 
and  HBHA(CBCACO)NH  spectra  (55).  Aromatic  lH  and 
13C  resonances  were  obtained  from  a  combination  of  2D 
HMQC,  2D  HMQC-TOCSY,  3D  HMQC-TOCSY,  and  2D 
NOESY-HMQC  spectra  (58,  59).  A  HNHA  experiment  was 
performed  to  assign  Ha  and  to  obtain  coupling  constants  (60). 

DNA  Assignments.  Resonance  assignments  for  unlabeled 
DNA  bound  to  13C,15N-labeled  protein  were  obtained  using 
standard  assignment  methods  for  DNA  (61).  The  data  was 
obtained  with  doubly  13C/15N-filtered  NOESY  and  ^-filtered 
TOCSY  experiments  (62,  63). 


Structural  Constraints.  The  main  source  of  structural 
information  was  the  proton— proton  distance  constraints 
identified  from  NOESY  spectra.  Three-dimensional  15N- 
NOESY-HSQC  experiments  (64)  using  50—125  ms  mixing 
times  were  used  for  intramolecular  restraints  in  the  home- 
odomains,  along  with  a  13C-NOESY-HSQC  experiment  using 
a  150  ms  mixing  time  (55). 

Intramolecular  distance  restraints  for  the  DNA  were 
obtained  from  an  r~6  scaling  of  cross-peak  volumes  in  the 
NOESY  spectra.  Upper  and  lower  bounds  were  calibrated 
on  the  cytosine  intraresidue  H5— H6  NOE  and  set  to  zb  15% 
of  the  calculated  distance  for  base  and  HU  protons.  Restraint 
boundaries  to  other  sugar  protons  were  widened  an  additional 
10%  to  account  for  effects  of  spin  diffusion.  Restraints  from 
the  longer  mixing  time  125  ms  experiment  were  assigned  a 
lower  bound  of  3  A  and  an  upper  bound  of  5  A. 

Intermolecular  restraints  between  the  protein  and  DNA 
were  obtained  from  2D  13C(<z>i)-edited,  [13C,15N](<z>2)-filtered 
NOESY  spectra  (63,  65,  66).  The  NOEs  were  assigned 
manually,  and  only  unambiguously  assigned  peaks  were  used 
as  restraints  in  the  docking  calculation.  Weak  peaks  were 
assigned  an  upper  distance  limit  of  6.0  A,  while  medium 
peaks  had  an  upper  distance  limit  of  5.0  A,  and  stronger 
peaks  an  upper  distance  limit  of  4.0  A. 

Data  Processing  and  Analysis.  Raw  NMR  data  was 
processed  using  NMRPipe  (67).  Linear  prediction  was  used 
in  the  t\  dimension  for  2D  spectra  and  in  the  t\  and  r2 
dimensions  for  3D  spectra,  using  sinebell  window  functions 
for  apodization  and  zero  filling  in  all  dimensions.  Spectra 
were  viewed  and  analyzed  using  the  Sparky  graphical 
interface  (68).  This  program  was  used  to  pick  peaks  and 
integrate  them  using  a  Lorentzian  function. 

Structure  Calculation.  Referenced  chemical  shift  assign¬ 
ments  and  peak  intensities  from  Sparky  were  entered  into 
the  structure  calculation  program  CYANA  (69,  70).  CYANA 
consists  of  an  automated  NOE  assignment  program,  CAN¬ 
DID,  which  automatically  assigns  all  NOESY  cross-peaks, 
taking  into  account  nearness  of  chemical  shift,  network 
anchoring,  ambiguous  distance  constraints,  and  constraint 
combination  (70).  The  structure  is  then  calculated  using  the 
DYANA  algorithm,  which  calculates  structures  using  torsion 
angle  dynamics  (69).  Calibration  constants  for  peak  intensi¬ 
ties  versus  upper  distance  limits  were  determined  automati¬ 
cally  by  CYANA.  The  20  lowest  energy  conformers  were 
retained  after  structure  calculation  and  used  for  docking  to 
DNA. 

Docking  of  the  Protein  to  the  DNA.  The  protein  was 
docked  to  the  DNA  using  the  AMBER  all-atom  force  field 
with  the  generalized  Born  solvation  model  (71,  72).  The  20 
CYANA  structures  with  the  lowest  values  of  target  function 
were  docked  onto  canonical  B-form  DNA.  This  was  chosen 
as  the  starting  DNA  structure,  since  NOESY  spectra  for  the 
complex  indicated  the  DNA  to  be  close  to  B-form.  Starting 
structures  of  the  complex  were  generated  by  systematically 
placing  PITX2  in  varying  orientations  relative  to  the  DNA, 
with  helix  3  approximately  50  A  from  the  DNA.  For  each 
of  the  20  lo west-energy  structures,  5  different  orientations 
of  the  protein  relative  to  the  DNA  were  selected,  yielding 
100  starting  conformers. 

The  protein  was  docked  onto  the  DNA  by  a  20  ps 
simulated  annealing  calculation  (T  =  600  K,  time  step  =  1 
fs)  using  an  altered  version  of  a  procedure  described 
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previously  for  docking  of  TFIIIA  to  DNA  (73).  The 
temperature  was  increased  from  0  to  600  K  over  the  first  4 
ps,  held  at  600  K  for  2  ps,  and  then  slowly  cooled  to  0  K 
over  14  ps.  The  weights  of  the  force  constants  were  linearly 
increased  from  0.1  to  1  during  the  course  of  the  calculation. 
DNA  base-pairing  was  maintained  by  incorporating  Wat¬ 
son— Crick  hydrogen-bonding  restraints.  These  Watson— 
Crick  DNA  restraints  were  implemented  as  lower  and  upper 
bound  restraints  on  base-paired  heteroatom— heteroatom  (2.7 
to  3.1  A)  and  heteroatom— proton  distances  (1.67  to  2.07 
A),  and  had  a  final  force  constant  of  50  kcal  mol-1  A-2. 
The  intramolecular  protein  and  DNA  restraints  had  final  force 
constants  of  20  kcal  mol-1  A-2.  Protein  and  DNA  angle 
restraints  had  a  final  force  constant  of  32  kcal  mol-1  A-2. 
Protein— DNA  intermolecular  restraints  had  a  final  force 
constant  of  32  kcal  mol-1  A-2.  Protein  restraints  were  applied 
to  prevent  the  protein  conformation  from  being  altered  too 
much  from  the  structure  calculated  by  CYANA.  DNA 
restraints  were  applied  to  prevent  fraying  of  the  DNA,  and 
to  maintain  the  structure  close  to  B-form. 

Structure  Refinement  and  Analysis.  The  20  structures  with 
the  lowest  AMBER  energy  values  were  subjected  to  re¬ 
strained  energy  minimization  by  the  SANDER  module  of 
the  AMBER  7.0  package  (77,  72).  The  1994  version  of  the 
force  field  was  used.  Each  conformer  was  subjected  to  a 
conjugate-gradient  energy  minimization  calculation  with 
solvent  included. 

The  evaluation  of  the  structure,  i.e.,  analysis  of  geometry, 
stereochemistry,  and  energy  distributions  in  the  models,  was 
performed  using  the  program  PROCHECK  (74).  Restraint 
violations  were  analyzed  using  the  program  AQUA  (75). 
Graphics  were  prepared  using  MOLMOL  (76). 

RESULTS  AND  DISCUSSION 

Structure  Determination.  Assignments  of  the  protein 
backbone  and  side-chain  !H,  13C,  and  15N  resonances  were 
obtained  from  heteronuclear  spectra.  Restraint  data  derived 
for  the  PITX2  homeodomain— DNA  complex  are  sum¬ 
marized  in  Table  1.  Analysis  of  15N  and  13C  heteronuclear- 
edited  NOESY  spectra  recorded  at  various  mixing  times 
provided  1259  intramolecular  distance  restraints  comprising 
513  intraresidue,  338  sequential,  300  medium-range,  and  108 
long-range  NOE  contacts.  Torsional  restraints  for  55  0  and 
43  0  angles  were  obtained  from  a  3D  HNHA  experiment, 
and  from  using  Ca  chemical  shifts  (77,  78).  Overall,  there 
are  19  restraints  per  residue,  on  average,  for  intramolecular 
protein  NOEs.  All  of  these  protein  restraints  were  used  for 
structure  calculation  with  the  program  CYANA  (version 
1.0.6)  (69,  70).  After  the  final  round  of  structure  calculation, 
the  20  structures  with  the  lowest  CYANA  target  function 
were  used  for  docking  to  DNA  and  energy  minimization. 
The  final  average  CYANA  target  function  for  the  20 
structures  was  2.05  A2. 

A  total  of  292  distance  restraints  between  protons  within 
the  DNA  were  obtained  from  13C/15N-filtered  NOE  spectra. 
A  series  of  2D  13C(o>i)-edited,  [13C,15N](o>2)-filtered  NOESY 
spectra  provided  27  unambiguous  intermolecular  restraints 
between  the  protein  and  the  DNA.  These  restraints  were 
entered  into  the  program  AMBER  for  docking  and  energy 
minimization,  as  described  in  Materials  and  Methods. 

Quality  of  the  NMR  Structure.  The  structure  of  the 
PITX2— DNA  complex  was  calculated  by  a  restrained 


Table  1.  NMR  Structure  Statistics0 

NMR  constraints 

protein 

distance  constraints 

1259 

intraresidue 

513 

sequential 

338 

medium-range 

300 

long-range 

108 

dihedral  constraints 

98 

phi 

55 

psi 

43 

DNA 

292 

protein— DNA  (intermolecular) 

27 

total 

1676 

CYANA  target  function  value  (A2)*7 

2.05  ±  0.39 

no.  of  violations 

distance  violations  (>0.30  A) 

0 

dihedral  angle  violations  (>5.0°) 

1 

AMBER  energies  (kcal/mol)c 

mean  AMBER  energy 

-6268  ±  250 

van  der  Waals 

-399  ±  33 

electrostatic 

-3974  ±  336 

Ramachandran  plot  (%)d 

residues  in  most  favored  regions 

80.1 

residues  in  additional  allowed  regions 

14.8 

residues  in  generously  allowed  regions 

2.3 

residues  in  disallowed  regions 

2.8 

rmsd  from  the  mean  structure  (A) 

protein  (bb,  residues  3—58) 

1.38 

all  heavy  atoms  (residues  3—58) 

1.95 

protein  (bb,  all  residues) 

1.85 

DNA  (residues  68-78,  81-91) 

1.30 

complex  (residues  3—58,  68—78,  81—91) 

1.81 

a  A  total  of  30  conformers  were  calculated,  and  the  20  structures 

with  the  smallest  residual  CYANA  target  function  values  were  subjected 
to  docking  and  energy  minimization.  b  The  value  given  for  the  CYANA 
target  function  corresponds  to  the  value  before  energy  minimization 
(the  CYANA  target  function  is  not  defined  after  energy  minimization, 
since  the  conformers  no  longer  have  ECEPP  standard  geometry).  c  The 
value  given  represents  the  intra-protein  interaction  energy.  d  For  residues 

3-58. 

molecular  dynamics  docking  and  energy  minimization 
procedure  starting  from  the  coordinates  of  the  PITX2  protein 
calculated  from  CYANA  and  canonical  B-form  DNA  as 
described  in  Materials  and  Methods.  The  20  structures  with 
the  lowest  total  energies  were  selected  for  conformer  analysis. 
These  structures  exhibited  mean  AMBER  energies  of  —6268 
kcal  moD1  and  mean  van  der  Waals  and  electrostatic  energies 
of  —399  and  —3974  kcal  mol-1,  respectively.  The  mean 
AMBER  energies  given  represent  the  intraprotein  interaction 
energy. 

The  superposition  of  the  structures  (Figure  2)  demonstrates 
a  well-defined  tertiary  structure  for  PITX2  bound  to  DNA. 
The  structures  have  no  distance  violations  greater  than  0.3 
A,  and  only  1  angle  violation  greater  than  5°.  Analysis  of 
Ramachandran  plots  for  the  ensemble  indicates  that  the 
structures  generally  show  favorable  backbone  conformations 
within  allowed  conformational  space,  with  80.1%  of  the 
residues  3—58  within  the  most  favored  regions,  14.8%  in 
additionally  allowed  regions,  2.3%  in  generously  allowed 
regions,  and  2.8%  in  disallowed  regions  for  the  20  conform- 
ers  (Table  1).  The  N-  and  C-termini  are  largely  disordered. 
When  superimposed,  residues  3—58  have  an  average  root- 
mean- square  deviation  (rmsd)  from  the  mean  structure  of 
1.38  A  for  backbone  (N,  Ca,  C',  and  O),  1.95  A  for  all  heavy 
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(a) 


(b) 


Figure  2:  Ensemble  of  structures  of  the  PITX2  homeodomain— 
DNA  complex,  (a)  Ensemble  of  20  structures  showing  the  protein 
backbone  N,  Ca,  and  C'  atoms  and  the  DNA  backbone.  Helix  1  is 
colored  pink,  helix  2  green,  helix  3  purple,  and  the  DNA  strands 
are  coral.  Superimposition  was  performed  using  backbone  atoms 
from  protein  and  DNA.  (b)  Alternate  view  of  the  structure,  rotated 
by  approximately  90°. 

atoms,  and  1.85  A  for  the  backbone  when  all  residues  are 
included.  The  global  rmsd  for  all  DNA  heavy  atoms 
(nucleotides  68—78,  81—91)  is  1.30  A.  The  rmsd  for  the 
entire  complex  (residues  3—58;  nucleotides  68— 78,  81— 91) 
is  1.81  A. 

Tertiary  Structure  of  the  PITX2  Homeodomain— DNA 
Complex.  The  overall  tertiary  structure  of  the  PITX2 
homeodomain  is  similar  to  other  homeodomains,  supporting 
previous  findings  that  this  tertiary  structure  is  well  conserved 
among  homeodomains  (7,  3 ,  79,  80).  The  tertiary  structure 
of  the  PITX2  homeodomain  is  composed  of  three  a  helices 
(Figure  3).  Helix  1  (residues  10—20)  is  followed  by  a  loop 
region,  and  then  helix  2  (residues  28—37)  runs  antiparallel 
to  helix  1.  Helix  2  and  helix  3  (residues  42—58)  form  a 
helix— turn— helix  motif.  Helix  3  is  approximately  perpen¬ 
dicular  to  helices  1  and  2,  and  fits  into  the  major  groove  of 
the  DNA.  The  N-terminus  of  the  homeodomain  makes 
contacts  within  the  minor  groove  of  the  DNA. 

The  helices  of  the  PITX2  homeodomain  are  held  together 
by  a  core  of  eight  tightly  packed  hydrophobic  amino  acids 


Figure  3:  Structure  of  the  PITX2  homeodomain— DNA  complex, 
(a)  Mean  structure  of  the  homeodomain.  Helix  1  is  colored  pink, 
helix  2  green,  and  helix  3  purple.  The  DNA  strands  are  colored 
coral,  (b)  Ribbon  diagram  of  the  mean  structure  of  the  PITX2 
homeodomain— DNA  complex. 

(F8,  L13,  L16,  F20,  L40,  V45,  W48,  and  F49).  These  amino 
acids  are  either  invariant  (W48  and  F49)  or  highly  conserved 
in  all  homeodomains  ( 3 ,  81,  82).  In  a  threading  analysis 
performed  previously  for  the  PITX2  homeodomain  (83),  it 
was  hypothesized  that  the  tertiary  structure  of  the  PITX2 
homeodomain  would  be  similar  to  other  homeodomains, 
mainly  because  many  of  these  hydrophobic  amino  acids  that 
are  present  in  other  homeodomains  are  also  present  in  the 
PITX2  homeodomain.  The  threading  analysis  threaded  the 
PITX2  homeodomain  sequence  to  the  Engrailed  home¬ 
odomain  structure,  so  the  overall  tertiary  structure  ended  up 
being  very  close  to  that  of  Engrailed.  The  threading  analysis 
did  not  provide  a  PDB  file  that  we  could  analyze  in  detail, 
and  in  any  case  is  not  necessarily  indicative  of  the  true 
molecular  structure.  For  this  threading  analysis,  the  focus 
was  the  role  of  Rieger  mutations  in  causing  disease,  and  there 
was  no  discussion  of  the  role  of  K50  in  determining  the 
DNA-binding  affinity  and  specificity  of  the  homeodomain, 
which  is  something  best  addressed  via  an  experimentally 
determined  structure  rather  than  a  threading  model.  This 
study  also  did  not  analyze  the  K50  Rieger  mutants,  so  there 
is  no  indication  what  the  structure  of  this  side  chain  was  in 
their  analysis.  In  the  absence  of  an  experimentally  determined 
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structure,  the  threading  model  was  most  useful  for  visualizing 
some  of  the  intramolecular  interactions  that  stabilize  the 
tertiary  structure,  and  the  predicted  interactions  are  consistent 
with  our  experimental  data.  While  we  cannot  compare  our 
PITX2  tertiary  structure  directly  to  that  of  the  threaded 
structure,  we  can  compare  it  to  EnQ50K  and  other  home¬ 
odomain  structures.  In  our  experimentally  determined  struc¬ 
ture,  the  first  helix  is  closer  to  the  second  helix  when 
measured  from  the  backbone  nitrogen  of  LI 6  to  the  backbone 
nitrogen  of  134  and  compared  to  the  EnQ50K,  Antennapedia, 
wild-type  Engrailed,  Fushi  tarazu,  vnd/NK-2,  and  MATa2 
homeodomains  (7,  84—88).  As  far  as  this  distance  is 
concerned,  PITX2  is  an  outlier  compared  to  the  other  six 
homeodomains.  This  distance  is  a  range  of  9.60—10.70  A 
for  Antennapedia  conformers,  9.43  A  for  the  crystal  EnQ50K 
structure,  9.54  A  for  wild-type  Engrailed,  9.30— 1 1.10  A  for 
Fushi  tarazu,  8.67  A  for  vnd/NK-2,  and  10.9  A  for  MATa2. 
However,  for  PITX2  this  distance  range  over  the  20 
conformers  is  only  7.55— 8. 58  A,  which  is  an  average  of  1.8 
A  closer.  In  view  of  the  rmsd  for  the  PITX2  protein  backbone 
atoms  (residues  3—58)  of  1.38,  this  result  is  still  significant, 
when  compared  with  the  ranges  of  distances  seen  in  the 
structures  of  the  other  homeodomains.  This  difference  is 
especially  significant  when  considering  that  the  rmsd  for 
helices  1  and  2  alone  is  only  0.78  A.  The  range  for  the 
distance  between  LI 6  and  134  for  all  of  the  other  homeo¬ 
domains  together  is  8.67— 11.10  A,  and  the  PITX2  distance 
range  is  completely  outside  of  this. 

In  addition  to  the  narrower  distance  between  the  first  and 
second  helices  in  the  PITX2  structure,  there  are  several  other 
differences  between  the  PITX2  and  EnQ50K  structures.  In 
particular,  the  third  helix  of  PITX2  is  positioned  about  0.5 
A  lower  (closer  to  the  N-terminus  of  helix  1  and  C-terminus 
of  helix  2)  than  in  EnQ50K  (Figure  4).  This  difference  in 
orientation  of  the  three  helices  causes  slightly  different 
contacts  to  be  made  between  the  first  and  third  helices,  and 
may  provide  a  partial  explanation  for  the  decreased  stability 
of  this  homeodomain.  Unlike  other  homeodomains  that  are 
stable  in  the  free  form  ( 32 ,  86,  89—92),  the  PITX2  homeo¬ 
domain  is  unstable  in  the  absence  of  DNA  in  that  it 
irreversibly  aggregates  at  micromolar  concentrations,  which 
suggests  a  possible  lack  of  stable  tertiary  structure  in  the 
free  form.  This  may  be  due  to  slightly  different  hydrophobic 
interactions  within  the  core  of  the  protein,  and  the  absence 
of  other  stabilizing  interactions  such  as  the  salt  bridge  linking 
residues  19  and  30,  which  can  be  present  in  most  homeo¬ 
domains  (93)  but  is  not  possible  in  PITX2.  One  difference 
seen  here  is  that  F49,  which  is  nearly  invariant  among 
homeodomains,  points  slightly  upward  toward  the  loop 
region  of  the  homeodomain,  instead  of  pointing  toward  the 
interior  of  the  protein.  The  orientation  of  the  first  helix  in 
relation  to  the  third  would  cause  a  steric  clash  with  F49  if  it 
were  in  an  orientation  similar  to  that  of  other  homeodomains. 
While  there  is  still  an  interaction  involving  F49  and  F20 
within  the  hydrophobic  core  of  the  PITX2  homeodomain, 
the  orientations  of  the  side  chains  themselves  are  different. 
This  differing  orientation  may  lessen  the  strength  of  the 
interaction  between  the  first  and  third  helices,  which  may 
affect  the  stability  of  the  protein  in  the  absence  of  DNA. 
This  difference  in  orientation  may  be  due  to  any  number  of 
differing  residues  between  the  two  homeodomains  (see 
Supporting  Information).  One  possibility  is  a  proline  residue 


Figure  4:  Overlay  of  PITX2  homeodomain  and  EnQ50K  homeo¬ 
domain  structures.  Cyan  corresponds  the  structure  of  the  PITX2 
homeodomain,  and  black  corresponds  to  the  structure  of  the 
Engrailed  mutanto  homeodomain.  (a)  Helices  1  and  2  are  ap¬ 
proximately  1.8  A  closer  to  each  other  in  PITX2  than  in  other 
homeodomains.  (b)  Alternate  view,  rotated  by  approximately  90°. 
Helix  3  is  about  0.5  A  lower  in  PITX2  than  in  EnQ50K. 

that  is  found  in  the  loop  region  between  helices  1  and  2  in 
PITX2,  but  is  not  present  in  Engrailed  or  Fushi  tarazu. 

In  the  PITX2  structure,  the  N-  and  C-terminal  segments 
—2  to  2  and  60  to  68  (Figure  1)  appear  disordered  (Figure 
2),  which  is  to  be  expected  on  the  basis  of  a  lack  of  medium- 
range  and  long-range  constraints  for  these  residues.  Analysis 
of  15N  relaxation  data  (unpublished)  indicates  that  residues 
—2  to  2  and  59  to  68  are  more  mobile  in  solution,  explaining 
the  observed  disorder  and  lack  of  restraint  information  for 
these  regions. 

Our  study  also  reveals  structural  information  about  the 
DNA  when  it  is  bound  to  the  protein.  Distance  restraints 
obtained  from  the  experiments  described  above  for  assigning 
the  DNA  were  entered  into  AMBER  during  the  docking 
procedure.  Visual  inspection  of  the  structure  of  the  PITX2 
homeodomain— DNA  complex  indicates  that  there  is  a  slight 
widening  of  the  minor  groove  of  the  DNA  compared  to 
B-form  DNA,  and  a  concomitant  narrowing  of  the  major 
groove.  Previous  structures  of  protein— DNA  complexes  have 
indicated  that  changes  in  DNA  structure  are  possible  upon 
protein  binding  (94).  A  more  thorough,  quantitative  analysis 
of  the  DNA  structure  when  PITX2  is  bound  will  not  be 
possible  until  a  high-resolution  structure  of  the  DNA  is 
determined,  using  isotopically  labeled  DNA  (95). 

Protein— DNA  Recognition.  Analysis  of  the  filtered  NOE- 
S  Y  experiments  produced  27  unambiguous  distance  restraints 
between  the  protein  and  the  DNA  (see  Supporting  Informa¬ 
tion  for  a  list).  These  include  contacts  that  have  been  seen 
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R2  HA  -  T74  Q5' 
OD-T74  Q5r 


QD  -  M2  4rH 
Ik  -  A 72  QV 


R5  He  -  G89  8H 
KN-G89  KH 


R46  QD  -  G84  SH 


F49  QB  -  G83  4  'H 
QB-GB3  Q5 ’ 


K50  HG2  -G83  Q5’ 
HG1  -G83  Q'5" 
HG1  -GS4  Q51 


GS3 


R52QB-T71  Q5r 


Figure  5:  Detailed  view  of  the  protein— DNA  interface  and  protein— DNA  contacts.  The  backbone  of  the  protein  is  shown  in  beige,  with 
the  Ca— Ha  bonds  shown  with  small  blue  lines.  Side  chains  of  the  protein  are  illustrated  in  cyan.  On  the  DNA,  blue  corresponds  to  guanine 
residues,  green  to  cytosine,  pink  to  adenine,  and  purple  to  thymine,  (a)  View  of  the  protein— DNA  NOE  contacts  between  the  N-terminus 
of  the  PITX2  homeodomain  and  the  minor  groove  of  the  DNA.  (b)  View  of  the  protein— DNA  NOEs  between  Y25,  R31,  and  the  DNA. 
(c,  d)  View  of  protein— DNA  NOE  contacts  between  residues  in  the  third  helix  and  the  major  groove  of  the  DNA. 


in  other  biochemical  and  structural  studies  of  homeodomains. 
Many  of  the  residues  that  interact  with  the  DNA  are 
arginines,  including  R3  and  R5  at  the  N-terminus,  R31  in 
the  second  helix,  and  R46,  R52,  and  R53  in  the  third  helix 
(Figure  5).  Other  residues  that  were  found  to  make  DNA 
contacts  are  Y25  and  F49.  A  number  of  NOESY  peaks  were 
also  seen  between  K50  and  the  DNA,  and  these  contacts 
are  discussed  further  below. 

A  detailed  picture  of  the  protein— DNA  interface  is  shown 
in  Figure  5.  This  figure  illustrates  the  orientations  of  some 
of  the  side  chains  that  are  important  in  DNA  binding, 
particularly  within  the  third  helix  (the  specific  atom  contacts 
are  indicated  in  Figure  5).  Figure  lb  outlines  the  numbering 
of  the  DNA  used  in  the  following  discussion.  Figure  5a 
illustrates  the  protein— DNA  NOE  contacts  seen  within  the 
N-terminal  arm.  NOE  contacts  were  seen  in  the  minor  groove 
between  R2,  R3,  and  R5  and  DNA  residues  A72,  T74,  and 
G89.  Although  the  NOESY-derived  distance  constraints 
indicate  contact  between  residues  R3  and  R5  and  the  minor 
groove,  15N  relaxation  data  (unpublished)  indicates  that  this 
region  of  the  N-terminus  does  retain  some  degree  of  mobility; 
similar  results  were  reported  for  the  Even- skipped  home¬ 
odomain,  on  the  basis  of  refined  atomic  B  factors  (30).  Broad 
line  widths  were  observed  for  the  backbone  NH  resonances 
of  His7  and  Phe8,  which  are  indicative  of  slow  time  scale 
motions  in  this  region  of  the  homeodomain  and  could 
possibly  render  undetectable  possible  NOEs  from  these 
residues  to  the  DNA. 


In  the  second  helix,  R31  has  a  NOE  contact  from  the  EE 
position  to  G82  Q5'  (Q  refers  to  a  pseudoatom  representation, 
to  indicate  that  stereo  specific  assignment  of  the  5'  and  5" 
protons  was  not  made),  as  can  be  seen  in  Figure  5b.  HBPLUS 
analysis  (96)  indicates  that  R3 1  is  making  a  hydrogen  bond 
contact  with  the  phosphate  backbone  of  this  nucleotide.  In 
the  loop  between  helices  1  and  2,  Y25  FF  is  making  NOE 
contacts  with  G81  Q5'  and  G82  4'H.  In  the  third  helix,  V47 
Qyl  is  making  conserved  NOE  contacts  to  A72  Q5',  A73 
2'H  and  2"H,  and  T74  6H.  Residue  W48  has  a  NOE  contact 
between  Ha  and  A72  8H.  R44  H6  is  making  contact  with 
DNA  proton  A73  8H,  R44  Q?  with  T74  4'H,  and  R44  Q6 
with  T74  2"H.  HBPLUS  analysis  indicates  that  R44  is 
making  a  backbone  hydrogen  bond  contact  to  the  phosphate 
of  T74.  Residues  44,  47,  and  48  are  illustrated  in  Figure  5c. 
In  the  third  helix,  R46  and  R52  appear  to  be  making 
conserved  contacts  with  the  DNA  backbone.  R46  extends 
upward,  and  R52  extends  downward  to  make  these  contacts 
(Figure  5d).  R46  Q 6  has  a  NOE  contact  with  G84  8H.  R52 
has  a  NOE  contact  with  T71  Q5'.  R53  Qr  makes  a  NOE 
contact  with  G83  4'H.  All  of  these  NOEs  could  be  due  to 
the  close  proximity  of  the  atoms  while  the  side  chains  form 
hydrogen  bonds  with  backbone  phosphate  groups.  NOEs  are 
also  seen  between  F49  and  G83  4'H  and  Q5'.  K50  will 
be  discussed  further  below,  but  as  can  be  seen  in  Figure  5d, 
there  are  NOE  contacts  between  the  K50  side  chain  and 
atoms  from  G83  and  G84. 
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Figure  6:  (a)  View  of  the  20  conformers,  with  only  the  K50,  G83,  and  G84  backbone  and  side-chain  atoms  shown  to  illustrate  the  extent 
of  disorder  of  the  K50  side  chain,  implying  possible  mobility  of  this  side  chain  in  interacting  with  the  DNA.  K50  atoms  are  shown  in  blue, 
and  G83  and  G84  atoms  are  shown  in  pink.  Backbone  atoms  are  bolder  than  side-chain  atoms,  (b)  Strips  from  an  H(CCO)NH-TOCSY 
spectrum  showing  proton  resonances  for  the  side  chains  of  K58  and  K50.  Line  broadening  of  resonances  in  the  K50  side  chain,  leading  to 
the  weak  or  missing  signals,  is  indicative  of  possible  motion  of  this  side  chain.  The  asterisks  in  the  K50  slice  indicate  peaks  breaking 
through  from  an  adjacent  15N  plane.  The  sole  K50  peak  is  for  the  Ha  proton  resonance. 


Other  residues  that  were  observed  in  the  calculated 
structures  to  be  in  close  contact  with  the  DNA,  but  without 
NOEs  being  seen  in  the  NMR  data,  are  N51,  K55,  R57,  and 
K58.  N51  is  nearly  invariant  among  homeodomains  (82)  and 
is  found  herein  to  make  the  same  highly  conserved  interaction 
within  the  major  groove  with  base  A73.  This  residue  has 
been  shown  in  crystal  structures  to  form  a  pair  of  hydrogen 
bonds  with  this  adenine  at  the  N7  and  N6  positions,  while 
NMR  studies  have  indicated  possible  rapidly  interchanging 
conformations  (32,  97).  NMR  studies  have  shown  this  close 
interaction,  but  no  NOEs  are  seen,  possibly  due  to  line¬ 
broadening  effects  (97).  While  no  NOEs  are  seen  between 
K55,  R57,  or  K58  and  the  DNA,  HBPLUS  analysis  of  the 
complex  indicates  that  there  are  possible  interactions  present. 
K55  may  be  forming  a  hydrogen  bond  with  the  phosphate 
of  T71.  R57  may  be  contacting  the  phosphate  of  G84.  K58 
may  be  contacting  the  phosphate  of  C70.  Due  to  the  usual 
sensitivity  limitations  in  the  edited/filtered  NMR  experiments 
employed  to  identify  intermolecular  NOEs,  it  is  quite  likely 
that  a  number  of  anticipated  NOEs  fall  at  or  below  the 
threshold  for  detection. 

The  Role  of  Lysine  at  Position  50.  No  previous  structures 
have  been  described  for  any  native  K50  class  homeodomains. 
However,  the  X-ray  crystal  structure  of  the  Q50K  mutant  of 
the  Engrailed  homeodomain  bound  to  DNA  has  been 
reported  (7),  and  the  side  chain  of  K50  was  found  to  project 
into  the  major  groove  of  the  DNA,  making  hydrogen  bond 
contacts  with  the  06  and  N7  atoms  of  the  guanines  at  base 
pairs  5  and  6  of  the  complementary  strand  of  the  TAATCC 
binding  site.  Our  structure  of  the  PITX2  homeodomain  marks 


the  first  experimentally  determined  structure  of  a  native  K50 
class  homeodomain,  and  is  important  for  validating  results 
seen  in  the  studies  of  non-native  proteins.  When  binding  to 
the  consensus  site,  the  position  of  K50  is  very  similar  to 
that  seen  in  the  EnQ50K  structure,  with  the  side  chain  of 
K50  extending  outward  and  making  contacts  with  the  two 
guanines  adjacent  to  the  TAAT  core  sequence  on  the 
antisense  strand  (Figure  5d).  NOEs  are  observed  between 
the  K50  Qr  and  the  Q5'  protons  of  G83  and  G84.  The  N^  of 
the  K50  side  chain  is  likely  making  hydrogen  bond  contacts 
to  the  06  and  N7  atoms  of  G83  and  G84,  according  to 
analysis  by  HBPLUS  (96). 

NMR  spectroscopy  allows  one  to  obtain  information  about 
the  mobility  of  the  protein  backbone  and  side  chains.  A  key 
finding  in  the  present  study  was  that  the  side  chain  of  K50 
potentially  mediates  recognition  by  fluctuating  between 
multiple  conformations.  The  conformational  heterogeneity 
can  be  seen  in  Figure  6a.  This  preliminary  evidence  is  based 
on  averaging  of  NOEs  and  broadening  of  resonances  for  this 
residue.  The  averaging  of  NOEs  was  dealt  with  as  ambiguous 
distance  constraints  within  the  structure  calculation  in 
CYANA,  and  these  constraints  were  satisfied  in  all  structures 
of  the  family.  When  results  from  an  H(CCO)NH-TOCSY 
experiment  are  compared  between  the  K50  and  K58  side 
chains  (Figure  6b),  peaks  are  easily  seen  for  the  K58  side- 
chain  resonances,  but  only  the  Ha  resonance  is  seen  for  the 
K50  side  chain.  The  extra  peaks  in  the  K50  strip  of  Figure 
6b  are  from  another  residue  on  an  adjacent  nitrogen  plane 
and  are  strong  enough  to  show  up  as  residual  peaks  on  this 
plane.  The  broadening  of  resonances  for  this  side  chain  made 
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it  difficult  to  assign  using  typical  heteronuclear-edited  NMR 
spectra.  Instead,  assignments  were  made  using  NOESY 
spectra  and  eliminating  assignments  from  nearby  residues, 
until  only  K50  resonances  were  left.  In  principle,  it  is  possible 
that  the  line  broadening  of  K50  side-chain  resonances  could 
be  caused  by  ring  current  effects  from  aromatic  bases  in  the 
DNA,  or  by  mobility  of  other  nearby  protons  in  the  DNA 
binding  site.  However,  no  anomalous  line  broadening  was 
observed  for  DNA  proton  resonances  in  the  vicinity  of  the 
K50  side  chain.  In  addition,  results  similar  to  those  reported 
here  have  been  seen  in  other  DNA-binding  proteins  in  which 
side-chain  mobility  appears  to  cause  line  broadening  of 
resonances  (vide  infra)  ( 32 ,  97—99).  These  results,  in 
combination  with  the  multiple  conformations  observed  for 
K50  in  EnQ50K,  provide  compelling  evidence  that  the  side 
chain  of  K50  is  mobile.  Preliminary  15N  relaxation  measure¬ 
ments  of  the  homeodomain  backbone  dynamics  (un¬ 
published)  did  not  show  anything  unusual  in  the  region  of 
K50.  Some  degree  of  side-chain  mobility  at  the  protein— 
DNA  interface  would  be  expected  to  confer  an  entropic 
advantage  for  binding  to  the  DNA.  It  has  been  estimated 
previously  that  the  entropic  cost  of  keeping  a  lysine  side 
chain  static  during  binding  is  3  kcal  mol-1  (100).  This 
possible  entropic  component  cannot  be  assessed  until  a 
detailed  thermodynamic  study  is  performed  for  this  complex. 
This  hypothesis  of  K50  side-chain  mobility  will  be  explored 
further  in  the  future,  but  for  now,  it  is  complementary  to  the 
data  for  the  EnQ50K  mutant  (7).  The  crystal  structure 
indicates  that  there  are  two  alternate  conformations  for  the 
K50  side  chain,  one  in  which  the  side  chain  points  to  base 
pairs  5  and  6,  and  one  in  which  the  side  chain  is  oriented 
slightly  more  toward  base  pair  5.  It  must  be  pointed  out  that 
this  X-ray  structure  was  solved  at  cryogenic  temperatures, 
so  there  is  the  possibility  that  there  is  a  freezing  out  of  a 
subset  of  conformational  populations.  It  is  possible  that  these 
results  indicate  two  static,  nearly  isoenergetic  conformations 
for  this  side  chain  of  EnQ50K,  rather  than  a  dynamic 
fluctuation  between  two  conformations.  The  ^-factors  in  this 
case  provide  no  evidence  for  distinguishing  between  these 
possibilities.  The  ^-factors  are  low  for  the  side  chain  of  K50 
in  the  1.9  A  crystal  structure  of  EnQ50K,  varying  over  the 
range  20.8  to  23.6,  which  are  the  lowest  values  in  the  protein, 
aside  from  the  aromatic  ring  of  F49.  ^-factors  of  about  20 
indicate  uncertainties  of  about  0.5  A.  Typically,  ^-factors 
of  60  or  greater  in  high-resolution  crystal  structures  indicate 
possible  mobility  of  a  side  chain.  So,  according  to  the  crystal 
results,  the  side-chain  position  of  K50  is  well  defined  in  the 
crystal,  in  contrast  to  the  possible  mobility  of  the  K50  side 
chain  seen  in  our  results.  The  true  nature  of  the  side-chain 
conformation  and  dynamics  may  involve  a  combination  of 
the  states  revealed  by  the  two  different  experimental  ap¬ 
proaches,  so  that  the  K50  side  chain  has  two  predominant 
conformations,  and  fluctuates  between  these  alternatives. 

Although  a  more  detailed  characterization  of  the  side-chain 
dynamics  in  the  PITX2— DNA  interface  must  await  data  from 
experimental  NMR  relaxation  measurements  and  molecular 
dynamics  simulations,  substantial  support  for  our  observation 
of  flexibility  in  the  K50  side  chain  already  exists  from  studies 
of  related  systems.  Significant  broadening  of  side-chain 
resonances  at  the  protein— DNA  interface  was  observed  in 
studies  of  homeodomain— DNA  complexes  of  Antennapedia 
(97)  and  NK-2  (32).  Moreover,  flexibility  in  lysine  side 


chains  appears  to  be  a  significant  feature  of  various  modes 
of  protein— DNA  interactions.  Foster  and  co-workers  (98) 
have  reported  clear  indications  of  substantial,  conformational 
fluctuations  in  lysine  side  chains  in  the  interface  of  the  zinc- 
finger  protein  TFIIIA  with  its  DNA  binding  site,  including 
the  observation  of  broadened  resonances  and  multiple  NOE 
contacts  that  strongly  suggest  rapid  conformational  averaging. 
Significant  line-broadening  effects  were  also  reported  for  a 
lysine  side  chain  in  NMR  studies  of  the  telomeric  DNA 
complex  of  trfl  (99).  In  addition  to  NMR  studies,  molecular 
dynamics  simulations  of  wild-type  (31)  and  a  Q50K  mutant 
(33)  of  the  Antennapedia  homeodomain  bound  to  DNA 
provide  further  evidence  in  support  of  a  dynamic  home¬ 
odomain— DNA  interface.  For  example,  the  Q50K  simula¬ 
tions  indicated  that  the  side  chain  of  K50  exhibited  very 
pronounced  mobility,  with  several  arrangements  of  the  lysine 
side-chain  torsion  angles  allowing  for  frequent  contacts,  both 
hydrogen-bonding  and  hydrophobic  interactions,  with  base 
pairs  5  and  6  in  the  TAATCC  binding  site.  In  this  case,  the 
lysine  in  the  Q50K  mutant  provides  both  entropic  and 
enthalpic  contributions  to  protein— DNA  affinity.  A  general 
observation  arising  from  the  known  structures  of  home¬ 
odomain— DNA  complexes  is  that  the  region  of  position  50 
is  not  in  intimate  contact  with  the  bases  of  the  major  groove. 
Such  a  relatively  unrestrained  arrangement  allows  for 
relatively  long-range  contacts  to  be  formed  in  multiple, 
possibly  isoenergetic  ways. 

Previous  studies  have  shown  that  the  lysine  at  position 
50  is  critical  for  its  binding  to  the  TAATCC  DNA  binding 
site  (82).  In  contrast,  homeodomains  with  a  glutamine  at 
position  50  bind  to  TAATGG  sites  with  a  higher  affinity. 
The  glutamine  at  position  50  appears  to  have  a  more  modest 
role.  When  this  residue  is  mutated  to  an  alanine,  the  Q50A 
mutant  has  an  affinity  and  specificity  very  similar  to  those 
of  the  wild-type  protein,  but  when  mutated  to  a  lysine,  the 
specificity  changes  (5).  These  studies,  along  with  the  current 
results,  indicate  that  the  interaction  between  K50  and  the 
two  guanines  at  positions  5  and  6  is  vital  to  the  affinity  and 
specificity  of  the  protein.  The  current  model  for  specific 
homeodomain— DNA  interactions  consists  of  a  fluctuating 
network  of  hydrogen  bonds  formed  between  polar  groups 
of  the  protein  and  the  DNA,  and  the  interfacial  water  (31). 
These  interactions  are  further  complemented  by  hydrophobic 
contacts.  The  possible  fluctuating  hydrogen-bonding  interac¬ 
tions  between  K50  and  the  DNA  and  subsequent  strict 
specificity  of  this  class  of  homeodomains  are  consistent  with 
this  model.  Investigation  of  side-chain— base  interactions  has 
shown  that  lysine— guanine  interactions  are  very  common 
(101).  K50  homeodomains  may  have  such  a  strong  specificity 
for  the  TAATCC  site  because  the  orientation  of  the  lysine 
is  in  an  ideal  position  for  the  charged  group  to  make 
hydrogen-bonding  contacts  with  the  two  guanines.  In  con¬ 
trast,  these  hydrogen  bonds  cannot  be  made  with  cytosines, 
which  are  in  these  positions  for  the  Q50  binding  site 
TAATGG  (101).  The  N7  of  guanine  is  the  most  electroneg¬ 
ative  region  of  the  major  groove  (102),  and  the  favorable 
interactions  that  the  lysine  can  make  with  both  guanines  in 
a  mobile  model  may  determine  why  K50  homeodomains  are 
so  specific  for  the  TAATCC  binding  site,  rather  than  other 
binding  sites. 

Analysis  of  Residues  Mutated  in  Rieger  Syndrome.  There 
have  been  9  mutations  found  in  the  PITX2  homeodomain 
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Table  2. 

PITX2  Homeodomain  Mutations  (35,  45,  48,  103—109) 

mutation 

disease 

properties 

L16Q 

Rieger  syndrome 

unstable,  no  activation,  no  DNA 
binding 

T30P 

Rieger  syndrome 

no  activation,  only  binds  consensus 

R31H 

iridogoniodysgenesis 

reduced  activation,  only  binds 
consensus  site 

V45L 

Rieger  syndrome 

<  10-fold  reduction  in  DNA  binding, 
200%  increase  in  activation 

R46W 

iris  hypoplasia 

reduced  binding  to  nonconsensus 
site,  reduced  activation 

K50E 

Rieger  syndrome 

no  DNA  binding  or  activation, 
dominant  negative 

K50Q 

Rieger  syndrome 

not  known 

R52C 

Rieger  syndrome 

not  known 

R53P 

Rieger  syndrome 

no  nonconsensus  binding,  no 
activation,  dominant  negative 

Figure  7:  Ribbon  diagram  of  the  PITX2  homeodomain— DNA 
complex  showing  the  positions  of  the  side  chains  for  the  residues 
known  to  be  mutated  in  Rieger  syndrome  and  related  disorders. 


in  Rieger  syndrome  and  related  disorders  (35,  45,  48,  103— 
109).  These  mutations,  along  with  their  known  biochemical 
effects,  are  listed  in  Table  2.  The  consequences  of  these 
mutations  vary.  Some  mutations  cause  a  total  lack  of  DNA 
binding,  while  others  can  still  bind  DNA,  albeit  with  a 
decreased  affinity.  These  consequences  are  directly  reflected 
in  the  severity  of  the  disease.  A  model  of  the  PITX2 
homeodomain  structure  was  created  previously  by  threading 
analysis,  which  allowed  predictions  to  be  made  regarding 
the  role  of  Rieger  syndrome  mutations  in  PITX2  dysfunction, 
although  it  is  not  necessarily  an  indication  of  the  true 
molecular  structure  (83). 

The  orientations  of  the  side  chains  altered  in  Rieger 
syndrome  patients  are  shown  in  Figure  7.  Analysis  of  these 
orientations  provides  insights  into  the  role  of  each  side  chain, 
and  how  mutations  in  these  positions  could  alter  the  structure 
and  function  of  the  protein.  Future  studies  will  focus  on 
analyzing  the  mutant  proteins  by  NMR  spectroscopy.  The 
side  chain  of  highly  conserved  L16  points  toward  the  interior 
hydrophobic  core  of  the  protein,  and  is  probably  involved 
in  stabilizing  both  the  formation  of  this  core  and  the  overall 
tertiary  structure  of  the  protein;  the  L16Q  mutation  would 
therefore  be  expected  to  destabilize  or  disrupt  this  hydro- 
phobic  core.  The  side  chain  of  T30  extends  outward  from 


the  second  helix,  away  from  the  DNA,  so  it  does  not  appear 
to  play  a  role  in  DNA  recognition.  Biochemical  studies  have 
shown  that  this  mutant  can  still  bind  consensus  DNA,  but 
no  longer  activates  transcription  of  a  reporter  gene  (25).  This 
residue  may  perform  an  activation  function  by  interacting 
with  other  proteins,  which  could  easily  be  disrupted  by  the 
effects  of  the  proline  mutation.  An  interesting  observation 
is  that,  in  many  homeodomains,  residue  30  is  involved  in  a 
salt  bridge  to  residue  19,  whereas  this  is  not  possible  for 
PITX2.  The  side  chain  of  R31,  as  described  above,  appears 
to  contact  the  DNA  backbone  phosphate  of  G82.  Therefore 
mutating  this  residue,  even  to  another  positively  charged 
residue,  may  disrupt  this  interaction  with  the  DNA  and  may 
disrupt  a  possible  salt  bridge  with  E42.  The  histidine  side 
chain  at  this  position  in  the  mutant  may  not  have  favorable 
steric  interactions  with  the  DNA.  The  side  chain  of  V45 
points  toward  the  interior  of  the  protein  from  the  third  helix. 
Like  LI 6,  this  side  chain  appears  to  be  involved  in  formation 
of  the  hydrophobic  core  of  the  protein.  Unlike  the  L16Q 
mutant,  the  V45L  mutant  has  the  unusual  characteristic  of 
having  a  greatly  heightened  activation  function,  while  having 
a  reduced  DNA-binding  ability.  It  is  possible  that  this  mutant 
affects  the  protein  in  a  way  that  alters  these  two  functions 
separately,  with  a  different  fold  of  the  protein  that  allows 
for  a  more  efficient  interaction  with  other  proteins.  Lor 
example,  altered  interactions  of  the  PITX2  homeodomain 
with  the  C-terminal  tail  of  the  full-length  PITX2  protein 
could  have  differential  effects  on  DNA  binding  and  activation 
(110).  The  DNA-binding  functions  of  R46,  K50,  R52,  and 
R53  were  discussed  in  detail  above.  Mutating  these  residues 
would  disrupt  many  favorable  interactions  with  the  DNA, 
and  biochemical  studies  have  indicated  that  these  mutations 
interfere  with  DNA  binding.  Overall,  these  results  are  similar 
to  the  threading  analysis,  but  provide  a  more  direct  and 
detailed  understanding  of  the  roles  of  these  residues. 

Many  of  the  residues  in  the  PITX2  homeodomain  found 
to  be  altered  in  Rieger  syndrome  are  involved  in  contacting 
the  DNA.  Other  residues  are  involved  in  forming  the 
hydrophobic  core  of  the  protein,  which  stabilizes  the  global 
fold.  The  analysis  of  mutations  causing  structural  changes 
could  be  very  relevant  for  the  understanding  and  prediction 
of  dysfunctions  caused  by  mutations  in  homeodomains,  as 
several  homeodomains  are  known  to  be  involved  in  various 
diseases  (111—115). 

CONCLUDING  REMARKS 

The  structure  previously  determined  for  the  Engrailed 
Q50K  mutant  (7)  provided  some  interesting  insights  into  the 
possible  role  of  lysine  at  position  50.  The  presence  of 
hydrogen  bonds  between  position  50  and  the  DNA  had  not 
been  seen  previously.  But  many  questions  remained  unan¬ 
swered  concerning  the  role  of  lysine  in  a  native  K50 
homeodomain.  Lor  example,  the  Engrailed  mutant  has  a 
dissociation  constant  of  0.0088  nM  (7),  representing  an 
unusually  high  affinity  for  homeodomain— DNA  interactions. 
Previous  studies  have  indicated  that  proteins  with  excessively 
high  affinities  for  DNA  or  RNA  can  cause  functional  defects 
(116, 117).  The  unusually  high  affinity  of  EnQ50K  for  DNA 
suggests  that  it  may  have  properties  that  make  it  different 
from  natural  K50  homeodomains.  Unlike  the  Engrailed 
mutant,  the  native  K50  class  homeodomains  PITX2  and 
Bicoid  have  properties  that  make  them  unstable  in  free  forms, 
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and  have  affinities  within  the  normal  nanomolar  range  (25, 
26,  Supporting  Information).  When  DNA  is  not  present,  these 
proteins  will  irreversibly  aggregate  and  precipitate  out  of 
solution  at  micromolar  concentrations.  These  differences  in 
biochemical  properties  between  the  mutant  and  natural  K50 
proteins  suggest  the  importance  of  understanding  the  struc¬ 
tural  properties  of  lysine  at  position  50  in  the  context  of  a 
native  K50  class  protein. 

But  the  question  still  remains  as  to  what  causes  these 
differences.  The  authors  of  the  EnQ50K  structure  found  that 
the  mutant  bound  to  DNA  more  tightly  and  specifically  than 
did  the  native  protein  (7).  They  hypothesized  that  this  was 
due  to  very  specific  hydrogen  bonds  between  the  K50  side 
chain  and  the  guanines  at  positions  5  and  6  on  the  antisense 
strand.  In  our  study,  we  found  that  the  native  K50  home- 
odomain  PITX2  has  a  slightly  different  tertiary  structure,  with 
helix  1  being  closer  to  helix  2  than  in  other  homeodomains, 
including  the  EnQ50K  mutant.  Helix  3  is  angled  about  0.5 
A  closer  to  the  N-terminus  of  helix  1  and  C-terminus  of  helix 
2  than  EnQ50K.  This  appears  to  cause  a  difference  in  the 
way  that  helix  1  and  helix  3  can  interact,  and  previous  studies 
have  shown  that  this  interaction  between  the  helices  stabilizes 
the  global  fold  of  the  homeodomain  (3,  81,  82).  Another 
Q50K  mutant,  this  time  of  Fushi  tarazu,  is  unable  to  bind 
nonconsensus  DNA  sites  that  PITX2  and  Bicoid  are  able  to 
recognize  (27).  It  is  currently  unknown  whether  the  Engrailed 
mutant  can  bind  nonconsensus  sites.  These  differences  in 
affinity  and  specificity  may  involve  any  of  the  differing 
residues  between  these  homeodomains.  Positions  50  and  54 
have  been  shown  to  be  involved  in  recognizing  nonconsensus 
DNA  sites  (22),  and  it  is  possible  that  other  residues  are 
also  involved.  Within  the  third  helix,  position  52  of  Engrailed 
is  a  lysine.  In  PITX2,  Bicoid,  and  Fushi  tarazu,  this  residue 
is  an  arginine.  We  do  not  know  whether  having  lysine 
residues  at  both  positions  50  and  52  could  contribute  to  the 
unnaturally  tight  binding  of  EnQ50K,  but  this  is  a  possibility. 

The  current  study  of  the  solution  structure  of  the  PITX2 
homeodomain  reveals  possible  fluctuating  interactions  be¬ 
tween  the  K50  side  chain  and  the  DNA.  It  is  possible  that 
this  mobile  side  chain  may  allow  the  protein  to  sample 
multiple  DNA  binding  sites,  and  enable  binding  to  the 
nonconsensus  sites,  though  at  a  slightly  lower  affinity.  It  will 
be  interesting  in  the  future  to  determine  if  other  natural  K50 
class  proteins  share  similar  properties  with  PITX2.  Future 
studies  will  focus  on  analyzing  Rieger  mutants  of  the  PITX2 
homeodomain,  and  analyzing  the  structural  features  of  this 
protein  when  bound  to  nonconsensus  DNA  binding  sites. 
This  will  allow  a  greater  understanding  of  the  roles  of  specific 
residues  in  consensus  and  nonconsensus  DNA  binding,  and 
a  greater  understanding  of  how  proteins  can  recognize 
multiple  DNA  sites  to  activate  transcription  of  genes. 

ACKNOWLEDGMENT 

We  would  like  to  especially  thank  Dr.  Jeffery  C.  Murray 
and  Dr.  Elena  V.  Semina  from  the  University  of  Iowa  for 
providing  the  original  pitx2  clone.  We  would  also  like  to 
thank  Dr.  Jack  Howarth  for  computer  support  and  maintain¬ 
ing  the  NMR  facility  at  the  University  of  Cincinnati. 


NOTE  ADDED  AFTER  ASAP  PUBLICATION 

This  paper  was  originally  published  4/26/05  with  an 
incorrect  entry  in  Table  1.  The  corrected  version  was  also 
published  4/26/05. 

SUPPORTING  INFORMATION  AVAILABLE 

Sequence  alignment  of  homeodomains  (Table  SI),  binding 
affinity  of  the  PITX2  homeodomain  (Figure  SI),  HSQC  with 
the  residues  labeled  (Figure  S2),  chemical  shift  assignments 
for  the  PITX2  homeodomain  and  its  DNA  binding  site 
(Tables  S2-S7),  protein— DNA  contacts  (Table  S8),  and 
Ramachandran  plot  for  the  PITX2  homeodomain  (Figure  S3). 
This  material  is  available  free  of  charge  via  the  Internet  at 
http  ://pubs .  acs .  org . 
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The  solution  structure  of  the  homeodomain  of  the  Drosophila  morphogenic 
protein  Bicoid  (Bed)  complexed  with  a  TAATCC  DNA  site  is  described. 
Bicoid  is  the  only  known  protein  that  uses  a  homeodomain  to  regulate 
translation,  as  well  as  transcription,  by  binding  to  both  RNA  and  DNA 
during  early  Drosophila  development;  in  addition,  the  Bed  homeodomain 
can  recognize  an  array  of  different  DNA  sites.  The  dual  functionality  and 
broad  recognition  capabilities  signify  that  the  Bed  homeodomain  may 
possess  unique  structural /dynamic  properties.  Bicoid  is  the  founding 
member  of  the  K50  class  of  homeodomain  proteins,  containing  a  lysine 
residue  at  the  critical  50th  position  (K50)  of  the  homeodomain  sequence,  a 
residue  required  for  DNA  and  RNA  recognition;  Bed  also  has  an  arginine 
residue  at  the  54th  position  (R54),  which  is  essential  for  RNA  recognition. 
Bed  is  the  only  known  homeodomain  with  the  K50/R54  combination  of 
residues.  The  Bed  structure  indicates  that  this  homeodomain  conforms 
to  the  conserved  topology  of  the  homeodomain  motif,  but  exhibits  a 
significant  variation  from  other  homeodomain  structures  at  the  end 
of  helix  1.  A  key  result  is  the  observation  that  the  side-chains  of  the 
DNA-contacting  residues  K50,  N51  and  R54  all  show  strong  signs  of 
flexibility  in  the  protein-DNA  interface.  This  finding  is  supportive  of  the 
adaptive-recognition  theory  of  protein-DNA  interactions. 

©  2006  Elsevier  Ltd.  All  rights  reserved. 

Keywords:  bicoid;  homeodomain;  DNA/RNA-binding  protein;  NMR; 
molecular  dynamics 


Introduction 

Homeodomains  are  an  evolutionarily  conserved 
class  of  DNA-binding  domains  that  are  ubiquitous 
in  multicellular  organisms.1-6  Over  1060  unique 
homeodomain  sequences  from  112  different  species 
of  plants  and  animals  have  been  isolated  and 
sequenced.1  They  are  found  in  transcriptional 
regulatory  proteins,  such  as  engrailed  (fly),  MATa 
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(yeast),  and  PITX2  (human),  and  are  involved  in 
various  processes,  including  the  spatial  and  tem¬ 
poral  delineation  of  developmental  regions  in 
embryo,4  cell-type  specification  and  differen¬ 
tiation,7  and  the  maintenance  of  embryonic  stem 
cells.8  Bicoid  is  a  homeodomain-containing  Droso¬ 
phila  transcription  factor  that  directs  formation  of 
the  anterior-posterior  axis  in  the  developing 
embryo,9-12  through  recognition  of  enhancer 
elements  of  gap  and  pair-rule  genes,  such  as 
hunchback,  knirps,  and  even-skipped.13-18 

The  broad  goal  of  understanding  the  basic 
mechanisms  of  DNA  information  storage /retrieval 
and,  consequently,  the  basis  by  which  proteins 
recognize  distinct  DNA  sequences,  is  evident  in  the 
history  of  homeodomain  research.  The  prevalence 
of  the  homeodomain,  combined  with  its  small  size 
(60  amino  acid  residues)  and  functionality  in  the 
absence  of  the  rest  of  the  protein,  have  placed  the 
homeodomain  in  a  unique  position  to  serve  as 
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a  valuable  tool  for  probing  the  basis  of  protein- 
DNA  interactions,  and  have  made  it  a  long¬ 
standing  subject  of  functional,19-33  and  structural 
studies,  by  both  NMR1,34-46  and  X-ray  crystallo¬ 
graphy.1,4  -6°  These  studies  have  revealed  a  con¬ 
served  global  fold  consisting  of  three  a  helices  and  a 
flexible  N-terminal  arm  that  becomes  more  ordered 
upon  DNA  binding.46  Sequence-specific  recog¬ 
nition  of  the  DNA-binding  site  (a  double-stranded 
DNA  hexamer  defined  loosely  by  a  TAAT  core  in 
the  sense  strand)61  is  mediated  by  amino  acid 
residues  located  in  the  third  "recognition  helix",55,62 
that  bind  to  bases  in  the  major  groove,  and  the 
N-terminal  arm,  which  wraps  around  the  DNA 
double  helix  and  makes  contacts  with  bases  in  the 
adjacent  minor  groove.55,60  However,  despite 
numerous  studies  showing  the  conserved  global 
fold  and  general  binding  orientation  of  home- 
odomains  on  the  DNA,  fundamental  questions 
remain  regarding  the  specific  nature  of  interactions 
between  amino  acid  side-chains  and  nucleic  acids 
during  recognition  of  binding  sites  36,60 

Previous  research  has  shown  that  position  50  of 
the  homeodomain,  located  in  the  "recognition"  helix, 
plays  an  important  role  during  recognition  of 
specific  DNA-binding  sites.  This  position  is  occupied 
most  frequently  by  glutamine  but  can  be  occupied  by 
alanine,  serine,  cysteine,  lysine  or  isoleucine,  among 
others.  Position  50  plays  a  fundamental  role  in 
recognizing  the  bases  immediately  3'  to  the  TAAT 
core  (TAATNN)  and,  to  a  lesser  extent,  position  4 
of  the  TAAT  core  (TAATNN1,  and  has  been  the 
focus  of  many  structural36,5  ,52,58  and  functional 
studies.15,19,21,26,30,63  Homeodomains  with  glutamine 
at  position  50  (Q50)  recognize  TAATTA,  TAATTG,  or 
TAATGG  sites,29,64  while  those  with  lysine  at 
position  50  (K50),  like  Bicoid,  recognize  TAATCC 
or  TAAGCT.  Two  structural  studies  of  K50  home¬ 
odomains  have  been  done,  one  by  Tucker-Kellogg 
et  al.  of  the  engrailed  Q50K  mutant  crystal  struc¬ 
ture,58  and  the  second,  a  recently  published  solution 
structure  of  the  native  K50  PITX2  homeodomain  by 
our  group.36  Binding  studies  of  the  engrailed  Q50K 
mutant  indicated  that  the  presence  of  lysine  at 
position  50  conferred  a  DNA-binding  affinity  in  the 
picomolar  range  (KD  =  8.8X10-12  M),  considerably 
higher  when  compared  to  other  non-K50  home¬ 
odomains  (Kd  range  - 10  “9-10“ 10  M)  20,24,27,31,42 
Tucker-Kellogg  et  d.  associated  this  increase  in 
binding  affinity  with  the  specific  hydrogen  bond 
contacts  made  between  K50  and  the  DNA  in  their 
crystal  structure.58  Binding  analyses  performed  with 
the  Bicoid  and  PITX2  native  K50  homeodomains 
revealed  that  the  two  native  K50  homeodomains 
retained  binding  affinities  similar  to  those  seen  in 
other  homeodomains  (see  Chaney  et  d.36  and  see  SI 
of  the  Supplementary  Data),  in  contrast  to  the  results 
reported  by  Tucker-Kellogg  et  al.,58  and  indicating 
that  the  engrailed  Q50K  homeodomain  may  not 
reflect  the  behavior  of  native  K50  homeodomains. 

Additional  evidence  for  the  complex  role  of  the 
position  50  residue  comes  from  the  evolutionary 
co-variation  that  has  been  observed  among  the 


DNA-contacting  amino  acid  residues  of  the  recog¬ 
nition  helix,  specifically  between  positions  50  and 
54.22,63  Analysis  of  the  homeodomain  sequences 
available  through  the  NHGRI  homeodomain 
resource1  reveal  that  when  position  50  is  occupied 
by  a  glutamine  residue  (~63%  of  all  known 
homeodomains),  many  different  residues  can 
occupy  position  54,  with  the  majority  containing 
either  alanine  (~41%)  or  methionine  (~40%)  at  this 
position.  K50  homeodomains  ( ~  6%  of  listed  home¬ 
odomains)  have  a  stricter  evolutionary  requirement 
for  position  54  (alanine  70%,  glutamine  26%). 
A  mutant  form  of  the  thyroid  transcription  factor 
with  an  unnatural  combination  of  amino  acid 
residues  at  positions  50  and  54  (Q50K,  Y54M)  fails 
to  bind  to  the  DNA  site  predicted  by  earlier  studies 
of  the  individual  contributions  of  these  residues  to 
DNA  recognition  (K50-TAATCC,  M54-ATTAGG)  22 
These  results  suggest  that  the  stricter  evolutionary 
covariance  seen  for  K50  homeodomains  is  import¬ 
ant  for  maintenance  of  DNA  binding.  Studies  of  the 
MATa226  and  fushi  tarazu27  homeodomains  also 
concluded  that  combinatorial  effects  from  amino 
acid  residues  in  positions  50  and  54  contribute  to  the 
specificity  of  a  particular  homeodomain.  These 
analyses  of  evolutionary  covariance  highlight  co¬ 
mingled  functional  and  structural  requirements  for 
amino  acid  residues  that  cannot  be  approximated  or 
dissected  by  analysis  of  individual  point  mutants 
like  Q50K,  without  also  considering  intramolecular 
interactions  in  the  unperturbed  system. 

Perhaps  the  most  convincing  argument  for 
additional  biophysical  and  thermodynamic  studies 
of  Bicoid  is  its  unusual  role  as  the  only  known  protein 
that  uses  a  homeodomain  to  regulate  translation,  as 
well  as  transcription,  by  binding  to  both  DNA  and 
RNA  during  early  Drosophila  development.  Bicoid 
represses  the  translation  of  another  Drosophila  tran¬ 
scription  factor,  caudal ,  by  binding  to  the  3'  untrans¬ 
lated  region  (UTR)  of  the  caudal  mRNA.65-69  Both  K50 
and  R54  are  required  for  RNA  recognition.  Interest¬ 
ingly,  of  the  1063  unique  homeodomain  sequences 
currently  listed,  only  4.5%  contain  arginine  in 
position  54  of  the  recognition  helix,1  and  even  more 
unusual  is  the  Bicoid  homeodomain,  which  is  the 
only  known  homeodomain  that  contains  a  K50/R54 
combination.1  While  a  K50A  mutation  abolishes  both 
DNA  and  RNA  recognition  by  the  homeodomain, 
mutation  of  R54  to  alanine  preferentially  affects 
recognition  of  both  RNA  sequences,70  and  a  non¬ 
consensus  DNA  site,23  raising  questions  about  the 
role  of  R54  in  the  discrimination  of  DNA /RNA 
recognition  by  the  Bicoid  homeodomain. 

Despite  numerous  structural  and  functional  ana¬ 
lyses  of  homeodomains,  such  studies  have  not 
explained  the  effects  of  different  combinations  of 
amino  acids  on  DNA-binding  site  preference  and 
affinity.  Crystallographic  studies  have  generally 
indicated  that  there  are  several  conserved  and  stable 
interactions  at  the  protein-DNA  interface,  usually 
involving  the  nearly  invariant  N51. 52,58  This  stable 
interaction  is  accompanied  by  multiple,  significantly 
populated  conformations  for  the  side-chain  of  the 
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amino  acid  residue  in  position  50.  On  the  other  hand, 
NMR36'45,71  and  molecular  dynamics  simu- 
lations37'71-73  have  provided  strong  evidence  for  a 
dynamic,  fluctuating  environment,  including  motion 
of  both  the  N51  and  position  50  side-chains,  and 
water  in  the  interface.  Billeter  and  co-workers  suggest 
that  a  network  of  short-lived  contacts  during  protein- 
DNA  interaction  reduces  the  entropic  cost  that  would 
be  incurred  by  more  rigid  side-chain-DNA  inter¬ 
actions.71  The  study  of  the  PITX2  K50  homeodomain 
has  recently  lent  support  to  this  theory  by  showing 
that  the  K50  side-chain  resonances  exhibit  line 
broadening  and  multiple  conformations,  indicative 
of  conformational  flexibility 36  This  dichotomy  of  a 
dynamic  and  entropically  favorable  environment  at 
the  protein-DNA  interface  seen  by  NMR  and  mole¬ 
cular  dynamics  versus  a  more  organized  environment 
seen  in  X-ray  studies  presents  an  appealing  frame  of 
reference  for  the  analysis  of  the  solution  structure  of 
the  Bicoid  homeodomain,  the  effect  of  evolutionary 
covariance  of  amino  acids  on  nucleic  acid 
recognition,  its  dual  role  as  both  a  transcriptional 
and  translational  regulatory  protein,  and  the  role  of 
molecular  dynamics  in  protein-DNA  recognition. 

To  clarify  the  role  of  key  amino  acids  during  DNA 
recognition  by  Bicoid,  the  solution  structure  of  the 
Bicoid  homeodomain/consensus  DNA  binding 
site  (TAATCC)  complex  was  solved  using  NMR 
spectroscopy.  This  structure,  combined  with  the 
solution  structure  of  the  PITX2  homeodomain  that 
our  group  has  solved  recently,36  provide  the  first 
analyses  of  native  K50  homeodomain  structures, 
and  provide  two  distinct  foundations  for  the  future 
study  of  the  kinetic,  thermodynamic,  and  motional 
contributions  of  amino  acids  to  both  DNA  and  RNA 
recognition  by  homeodomains. 

Results  and  Discussion 

Homeodomain  structure  determination 

We  have  determined  the  structure  of  a  67  amino 
acid  residue  construct  of  the  Bicoid  homeodomain 
bound  to  a  13mer  duplex  DNA  site  containing  the 
consensus  5/-TAATCC-3//3/-ATTAGG-5/  site.  The 
Bicoid  homeodomain  construct  we  used  contains 
an  additional  N-terminal  glycine  (a  product  of  the 
TEV  cleavage  site),  and  six  extra  C-terminal  amino 
acid  residues  that  belong  to  the  native  Bicoid 
protein  (added  to  improve  solubility  and  reduce 
aggregation)  (Supplementary  Data  S2).  As  was  the 
case  for  the  PITX2  homeodomain,  the  Bicoid 
homeodomain  precipitated  out  of  solution  if  it 
was  concentrated  in  the  absence  of  the  DNA;  this  is 
in  contrast  to  the  behavior  of  other  homeodomains 
that  have  been  studied  under  solution  conditions, 
where  millimolar  concentrations  of  the  free  homeo¬ 
domain  have  been  reported.  The  XD  for  the  binding 
of  the  Bicoid  homeodomain  to  the  13mer  DNA 
site  used  in  this  study  was  measured  to  be 
4.28(±0.26)X10-1°  M  (see  Supplementary  Data 
SI).  A  total  of  1724  restraints  (Table  1)  were  used 


Table  1.  Structural  statistics 


A.  Restraint  statistics 

Total  restraints 

1724 

Total  protein  restraints 

1244 

Long  range  \i—j\  >5 

133 

Medium  range  1  <  \i—j\  <5 

250 

Short  range  \i—j\  <  1 

693 

Hydrogen  bonda 

34 

Angle  restraints  (T>  and  W,  w)a,b 

134 

Total  DNA  restraints 

447 

Intrabase 

126 

Interbase 

66 

Watson-Crick 

55 

Angles  (a,(3,Y,5,8,£,x) 

200 

Total  protein-DNA 

Intermolecular  restraints 

33 

B.  CYANA  statistics 

Target  function  (A2)(cyclel/cycle7) 

12.29/0.49 

Backbone  RMSD(A)(cyclel/cycle7) 

2.94/0.9  7 

C.  AMBER  statistics  (kcal/mol) 

Mean  AMBER  energy 

-7670  +  53 

Mean  van  der  Waals  energy 

-874  +  13 

Mean  electrostatic  energy 

-2150  +  12 

Mean  RMSD  from  ideal 

Bond  lengths  (A) 

0.0072  +  0.0005 

Bond  angles  (deg.) 

4.303  +  0.06 

Restraint  violations 

Average  per  structure,  >  0.3  A  o 

9 

Maximum  restraint  violation  (A) 

<0.5 

Angle  violations 

Average  per  structure,  >  5  deg. 

3 

Maximum  angle  violation  (deg.) 

<10 

D.  Mean  RMS  deviationso 

From  mean  structure  (A) 

(bb/ heavy  atom) 

Protein  (residues  10-58) 

1.39  +  0.41/1.83  +  0.40 

DNA  (w/o  S'/S'-terminal  bases) 

2.09  +  0.58/1.89  +  0.50 

DNA  (consensus  site  only) 

1.78  +  0.49/1.62  +  0.41 

Protein  (10-58  + consensus  site) 

1.49  +  0.36/1.79  +  0.35 

The  statistics  listed  here  are  indicative  of  the  20  structures 

deposited  in  the  RCSB  PDB  (1ZQ3). 

a  Hydrogen  bond  and  angle  restraints  were  obtained  via 

mutual  agreement  from  three  separate  analyses:  derivation  of 

coupling  constants  from  an  HNHA  experiment,  TALOS  chemical 

shift  analysis,  and  the  presence  of  i  to  i- 

-3  or  i— 4  NOEs. 

The  w  angle  restraints  were  obtained  via  the  AMBER 

program,  and  applied  to  all  residues  except  the  two  proline. 

the  N-terminal  glycine,  and  the  C-terminal  serine  residues. 

to  calculate  the  structure  of  the  Bicoid  homeo- 
domain-DNA  complex,  including  1076  protein 
distance  restraints  (133  long-range,  250  medium- 
range  and  693  short  range,  ~  22  restraints  per  amino 
acid  residue)  and  134  torsional  restraints  (cp,  and 
to  angles)  determined  by  consideration  of  the  3/a-NH 
coupling  constants  from  analysis  of  a  3D  HNHA 
experiment,  a  Ca  chemical  shift  analysis  by  the 
CYANA  program,  34  hydrogen  bond  restraints 
predicted  from  i  to  z  +  3  and  i  to  z  +  4  NOEs  seen 
between  backbone  HN  protons,  and  a  TALOS 
chemical  shift  analysis.74  Seven  cycles  of  torsion 
angle  dynamics  and  simulated  annealing  were 
performed  using  CYANA  2.075  (Table  IB)  to  arrive 
at  an  ensemble  of  20  structures  that  best  satisfied  the 
restraint  data.  In  general,  the  CYANA  calculation 
revealed  that  the  Bicoid  homeodomain  adopts  the 
three-helical  global  fold  that  has  been  seen  in 
structures  of  other  homeodomains  (Figure  1(a)). 
The  residues  10-21, 27-38,  and  43-60  form  helices  1, 
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Figure  1.  Structure  of  the  Bicoid 
homeodomain  before  docking 
to  the  DNA,  solved  using 
CYANA2.0.  (a)  Ensemble  of  the  20 
lowest  energy  conformers;  the 
three  helices  and  N  and  C  termini 
are  labeled.  The  second  and  third 
helices  form  the  canonical  helix- 
turn-helix  protein  motif,  (b)  Two 
views  of  the  conserved  amino  acid 
residues  of  the  hydrophobic  core  of 
the  mean  structure  of  the  ensemble. 
The  N  terminus  (F8,  black),  helix  1 
(113,  L16,  F20,  green),  the  turn 
between  helices  1  and  2  (L26, 
magenta),  helix  2  (L34,  L38,  red), 
the  turn  between  helices  2  and  3 
(L40,  cyan),  and  helix  3  (V45,  W48, 
F49,  blue)  all  contain  amino  acid 
residues  that  are  important  for 
formation  of  the  hydrophobic 
core,  (c)  Long-distance  hydrogen 
bonds /electrostatic  interactions  seen  in  the  mean  structure  of  the  Bicoid  homeodomain  ensemble.  Hydrogen  bonds 
(pink  dotted  lines)  were  seen  in  a  majority  of  conformers  between  Q12/L38  (green),  E15/K37  (blue),  E17/R52  (cyan), 
and  R24/R53  (red). 


2,  and  3,  respectively.  The  conserved  (through 
amino  acid  identity  or  similarity)  residues  F8,  113, 
L16,  F20,  L26,  L34,  L38,  L40,  V45,  W48,  and  F49  form 
the  hydrophobic  core  of  the  homeodomain 
(Figure  1(b)),  and  provide  interactions  among  the 
N-terminal  arm  (F8),  helix  1  (113,  L16,  F20),  the  turn 
between  helices  1  and  2  (L26),  helix  2  (L34,  L38),  the 
turn  between  helices  2  and  3  (L40)  and  helix  3  (V45, 
W48,  F49),  upon  which  the  three-helical  fold 
is  formed.  Helices  2  and  3  form  the  canonical 
helix-turn-helix  DNA  recognition  motif  that  is 
present  in  homeodomains,5  and  was  first  seen  in 
the  crystal  structures  of  the  cl  and  Cro  prokaryotic 
repressor  proteins  from  bacteriophage  lambda.76'77 
Additional  stability  for  the  homeodomain  fold  is 
provided  by  several  long-range  hydrogen  bonds/ 
electrostatic  interactions  seen  in  the  ensemble  of 
structures,  including  bonds  between  Q12  (terminal 
side-chain  amide  protons)  /L38  (backbone  carbonyl 
oxygen  atom),  El 7  (terminal  side-chain  oxygen 
atom)/R52  (terminal  side-chain  amino  protons), 
R24  (backbone  carbonyl  oxygen  atom)/R53 
(terminal  amino  protons),  and  E15  (terminal  side- 
chain  oxygen  atom)/K37  (terminal  side-chain 
amino  protons)  (Figure  1(c)). 

Calculation  of  DNA  structure 

A  schematic  of  the  duplex  DNA  and  the 
numbering  system  used  here  can  be  seen  in 
Figure  2(a).  Specific  patterns  of  proton-proton 
distances  seen  in  the  2D  nuclear  Overhauser  effect 
spectroscopy  (NOESY)  spectrum  of  the  DNA 
indicated  that  the  duplex  was  adopting  the 
B-DNA  conformation  (e.g.  the  short  distances 
between  the  H2'  and  H2",  and  the  Hl;  and  H5", 
protons  of  sequential  same-strand  deoxyribose 


sugars).78  This,  in  conjunction  with  previously 
determined  NMR  solution  structures  of  homeo- 
domain-DNA-binding  sites  shown  to  adopt  the 
B-DNA  conformation,  allowed  us  to  model  our 
DNA  sequence  onto  a  general  B-DNA  structure 
using  the  subprogram  NUCGEN79  in  the  AMBER 
suite80  (Figure  2(b),  red).  This  model  was  subjected 
to  10  ps  of  NMR-refinement  using  192  DNA 
proton-proton  distance  restraints,  Watson-Crick 
restraints  (to  maintain  base-pairing),  and  200 
angle  restraints  (standard  B-DNA)  using  the 
AMBER  all-atom  force  field  with  the  Generalized 
Born  solvation  model,81  resulting  in  a  DNA 
structure  (Figure  2(b),  green)  that  reflects  the 
DNA  NMR-derived  distance  restraints.  This  DNA 
structure  was  then  used  during  the  docking 
calculations  with  the  Bicoid  homeodomain 
structure  (Figure  2(b),  blue). 

Calculation  of  Bicoid  homeodomain-DNA 
complex  structure 

A  2D  13C(w1)-edited,  [13C,15N](w2)-filtered 
NOESY  spectrum  provided  33  unambiguous  inter- 
molecular  distance  restraints  (Table  2;  pertinent 
regions  of  the  spectrum  are  available  in  Supplemen¬ 
tary  Data,  Figure  S3).  Note  that  these  33  restraints 
describe  interactions  between  protons  separated  by 
distances  <6  A,  and  do  not  necessarily  indicate  the 
presence  of  hydrophobic,  electrostatic,  or  hydrogen 
bond  interactions  between  the  atoms.  Additionally, 
these  33  NOEs  do  not  necessarily  describe  all 
regions  of  close  proximity  between  the  protein 
and  the  DNA,  due  to  technical  limitations  of  the 
experiment,  such  as  ambiguity  in  the  assignment  of 
unused  NOEs  and  absence  of  possible  NOE  peaks 
due  to  limited  sensitivity  (e.g.  exchange-broadened 
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Figure  2.  Description  of  the  DNA 
site  used  for  this  study  and  its 
structure,  (a)  Schematic  and  num¬ 
bering  scheme  of  the  13mer  DNA 
duplex  used  for  this  study  The  5' 
ends  of  each  strand  are  base 
numbers  1  and  14.  (b)  Three  struc¬ 
tures  of  the  DNA  seen  during 
various  stages  of  the  structure 
calculation.  The  original  £>-DNA 
model  created  using  the  program 
NUCGEN  (red),  the  mean  structure 
after  the  initial  docking  calculation 
(green),  and  the  mean  structure 
after  the  final  energy  minimization 
in  the  absence  of  all  constraints 
(blue)  are  superimposed. 


peaks).  Designations  of  strong,  medium,  and  weak 
(indicated  in  parentheses  in  Table  2)  were  assigned 
to  each  NOE  on  the  basis  of  peak  volumes  when 
possible,  and  by  comparison  and  number  of 
contour  lines  when  peaks  were  overlapped,  and 
were  given  upper  distance  limits  of  4  A,  5  A,  and 
6  A,  respectively.  Overall,  five  amino  acid  residues 
were  in  close  proximity  to  the  DNA,  as  detected  by 
this  NMR  experiment,  three  of  which  are  in  helix  3 
(" recognition  helix")  (147,  K50,  and  R54),  one  of 
which  is  located  in  the  turn  between  helices  1  and  2 
(Y25),  and  one  of  which  is  located  in  the  N-terminal 
arm  (R3).  Nineteen  of  the  33  NOEs  involved  DNA 
bases  in  the  TAATCC/ATTAGG  consensus 
sequence,  and  ten  of  the  33  NOEs  involved  base 
protons  (ex.  H8  and  H6). 

All  1724  restraints  described  in  Table  1  were  used 
to  calculate  the  docked  protein-DNA  structure  as 
described  in  Materials  and  Methods.  Briefly,  the  20 
protein  structures  provided  by  the  CYANA  pro¬ 
gram  were  placed  50  A  away  from  the  NMR 
restraint-refined,  double-stranded  DNA  duplex. 
The  DNA  was  also  rotated  to  achieve  five  different 
starting  orientations  relative  to  each  of  the  20 
protein  structures  of  the  ensemble,  which  resulted 
in  100  different  starting  protein-DNA  coordinate 
files  which  were  input  into  the  AMBER  program. 
The  resulting  ensemble  of  docked  structures 
(Figure  3)  represents  the  20  structures  with  the 
lowest  violations  of  restraint  data  (Table  1C). 
Ramachandran  analysis82  via  PROCHECK83  indi¬ 
cated  that  88.5%  of  the  protein  backbone  residues 
were  in  the  most  favored  conformational  regions. 


with  10.3%,  0.7%,  and  0.4%  in  the  additionally 
allowed,  generously  allowed,  and  disallowed 
regions,  respectively  (see  Supplementary  Data  S4). 

Tertiary  structure  of  the  Bicoid  homeodomain: 
comparison  to  other  homeodomains 

To  better  understand  the  unique  multi-function¬ 
ality  of  the  Bicoid  homeodomain,  a  comparative 
analysis  of  the  backbone  structures  (Ca  coordinates) 
of  multiple  homeodomains  was  performed. 
A  structural  alignment  of  16  homeodomain  struc¬ 
tures,  including  the  Bicoid  homeodomain,  was 
obtained  using  the  MAMMOTH-mult  server84  at 
the  Centro  de  Biologia  Molecular  "Severo  Ochoa" 
(CBMSO),  which  provides  a  rapid  method  to  derive 
a  superposition  of  the  Ca  coordinates  of  multiple 
input  structures  (see  Materials  and  Methods  for  the 
homeodomains  used;  sequence  alignment  and  an 
RMSD  plot  are  available  in  Supplementary  Data, 
Figure  S5).  Due  to  variability  in  the  lengths  of  the  N 
and  C  termini  of  homeodomains,  structural  align¬ 
ment  was  performed  on  residues  8-55  of  the 
homeodomains  only,  resulting  in  a  pair-wise 
RMSD  of  0.93  A,  confirming  that  the  Bicoid  homeo¬ 
domain  adopts  an  overall  global  fold  similar  to 
previously  solved  homeodomains  structures.  Anal¬ 
ysis  of  the  local  RMSDs  (RMSD  of  three  amino  acid 
segments)  for  the  Bicoid  homeodomain  structure 
revealed  that  the  positions  of  the  Ca  atoms  of  the 
amino  acid  residues  at  the  end  of  helix  1  deviated 
from  the  other  homeodomain  structures  by  nearly 
2.5  A.  The  residues  involved  in  this  local  deviation 
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Table  2.  Protein-DNA  intermolecular  restraints 


DNA 

base 

DNA 

atom 

Protein  residue 
atom(s) 

Protein 

A6 

H2" 

147 

MD(m)a 

A6 

H2" 

147 

MG(m)b 

A7 

HP 

147 

MD(m) 

A7 

HP 

147 

MG(m) 

A7 

H2" 

147 

MD(m) 

A7 

H2" 

147 

MG(m) 

A7 

H3' 

147 

MD(s) 

A7 

H3' 

147 

MG(m) 

A7 

H8 

147 

MD(m) 

A7 

H8 

147 

MG(s) 

T8 

H6 

147 

MD(m) 

T8 

H6 

147 

MG(m) 

G16 

H3' 

Y25 

QD(m)c 

G16 

H3' 

Y25 

QE(m) 

G16 

H4' 

Y25 

QD(s) 

G16 

H4' 

Y25 

QE(s) 

G16 

Q5' 

Y25 

QD(m) 

G16 

Q5' 

Y25 

QE(m) 

G17 

H3' 

R54 

QD(m) 

G17 

H4' 

R54 

QD(m) 

G18 

Q5' 

R54 

QD(m) 

G17 

H8 

K50 

HE2(w) 

G17 

H8 

K50 

HE3(w) 

G17 

H8 

K50 

HG2(w) 

G17 

H8 

K50 

HD2(m) 

All 

H3' 

R3 

HD3(s) 

All 

Q5' 

R3 

HD2(m) 

All 

Q5' 

R3 

HD3(m) 

All 

Q5' 

R3 

QG(m) 

G23 

Q5' 

R3 

HD2(s) 

G23 

Q5' 

R3 

HD3(s) 

G23 

Q5' 

R3 

QG(m) 

G23 

H8 

R3 

QG(m) 

a  Strength  of  the  intermolecular  NOEs  are  in  parentheses:  (s) 
4  A;  (m)  5  A;  (w)  6  A. 

b  MG  and  MD  correspond  to  the  two  methyl  groups  of  the 
isoleucine  side-chain. 

c  For  atoms  that  did  not  have  a  stereospecific  assignment, 
restraints  to  pseudoatoms  were  used. 


Figure  3.  Superimposed  ensemble  of  the  20  lowest 
energy  structures  from  the  AMBER  docking  calculation. 
Helices  1,  2,  and  3,  and  the  N  terminus  of  the  home¬ 
odomain  are  labeled.  The  sense  and  anti-sense  strands  are 
labeled  A  and  B,  respectively.  Cytosine  (cyan),  guanosine 
(blue),  thymine  (yellow),  and  adenine  (red)  bases, 
deoxyribose  sugars  (magenta),  and  the  phosphate  back¬ 
bone  (black)  are  color-coded.  Helix  3  inserts  into  the 
major  groove  and  the  N-terminal  arm  wraps  around  and 
contacts  the  minor  groove.  Statistics  for  the  ensemble  are 
given  in  Table  1. 


include  Q18,  H19,  F20,  and  L21  in  helix  1,  and 
residues  Q22  and  G23  in  the  turn  between  helices  1 
and  2.  A  recent  alanine  scanning  study  of  the 
engrailed  homeodomain  indicated  that,  while  the 
side-chains  of  residues  18,  19,  and  21  of  this  region 
were  relatively  unimportant  for  maintenance  of 
engrailed  function,  residues  22  and  23  showed  a 
strong  preference  for  the  wild-type  residue,  indi¬ 
cating  the  importance  of  side-chain  identity  for 
these  residues  in  maintenance  of  DNA  recognition 
by  the  homeodomain.28  The  requirement  for 
phenylalanine  in  position  20  is  well  demonstrated 
by  its  conservation  across  the  homeodomain  family 
and  its  presence  in  the  conserved  hydrophobic  core. 
However,  the  strong  preferences  for  the  E22  and 
N23  residues  of  the  engrailed  homeodomain  were 
unexpected.  To  assess  the  relative  importance  of 
these  residues  to  the  homeodomain  family,  we 
analyzed  the  prevalence  of  amino  acids  at  these 
positions  across  the  1063  known  homeodomain 
sequences.  Position  22  of  the  homeodomain  showed 
very  little  evolutionary  conservation,  while  position 
23  showed  a  stronger  preference  for  the  asparagine 


residue  (37%)  observed  in  the  engrailed  study. 
Additionally,  a  clear  preference  for  larger  side- 
chains  with  hydrogen  bonding  capability  can  be 
seen  for  position  23  (see  the  Table  in  Supplementary 
Data,  S6).  Therefore,  we  hypothesize  that  the  local 
RMSD  differences  seen  in  the  Bicoid  homeodomain 
structure  are  caused  by  the  presence  of  a  glycine 
residue  in  position  23  (seen  in  only  12/1063  known 
homeodomain  sequences  at  this  position,  no  other 
G23  homeodomain  structures  available),  resulting 
in  the  alteration  of  the  turn  between  helix  1  and 
helix  2,  allowing  helix  1  to  move  closer  to  helix  2,  as 
seen  in  Figure  4. 

Basis  of  DNA  recognition  by  the  Bicoid 
homeodomain 

The  intermolecular  NOEs  listed  in  Table  2,  in 
combination  with  the  intramolecular  protein  and 
DNA  NOEs  used  to  describe  the  structure  of  each 
component  separately,  describe  (a)  the  global  dock¬ 
ing  arrangement  of  the  homeodomain  on  the  DNA, 
and  (b)  the  orientation  of  specific  side-chains  in  the 
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Figure  4.  Superimposed  Ca  atom  ribbon  diagrams  of  16 
homeodomains  (see  the  text  for  a  list).  The  region  of  high 
local  RMSD  difference  between  the  Bicoid  homeodomain 
(red)  and  the  rest  (grey)  is  labeled  with  a  red  arrow.  The 
Bicoid  homeodomain  contains  glycine  in  position  23, 
which  results  in  reduced  hydrogen-bonding  ability  to 
nearby  residues  and  an  alteration  of  the  homeodomain 
structure  when  compared  to  the  15  other  homeodomain 
structures.  The  functional  significance  of  this  difference  is 
unknown. 


interface  between  the  two.  From  these  data  alone,  it 
is  possible  to  draw  some  conclusions  about  specific 
side-chain-DNA  contacts  during  recognition  by  the 
Bicoid  homeodomain.  However,  a  more  extensive 
analysis  of  the  basis  of  protein-DNA  recognition, 
including  water-mediated  DNA  recognition,  can  be 
achieved  by  using  solvated  molecular  dynamics 
simulations,  which  approximate  the  energy  of  the 
system,  including  the  effects  of  water  molecules, 
salt  concentrations,  temperature,  electrostatics, 
hydrophobic  effects,  etc.  through  the  application 


of  molecular  and  solvent  force  fields.  The  results  of 
the  AMBER  docking  calculation,  in  conjunction 
with  the  experimentally  observed  NOEs  are 
discussed  here. 

Overall,  three  distinct  regions  of  the  homeo¬ 
domain  are  involved  in  recognition  of  the  DNA  site: 
the  N-terminal  arm,  the  turn  between  helices  1  and 
2,  and  the  recognition  helix.  The  N-terminal 
arm  (residues  1-9),  which  has  greater  sequence 
variability  among  homeodomains  than  the  more 
conserved  helical  regions  (residues  10-58),  has  been 
shown  in  other  homeodomain  studies  to  be 
unstructured  in  the  absence  of  DNA  and  to  wrap 
around  and  make  contacts  in  the  minor  groove 
when  bound  to  DNA.  This  is  corroborated  in  our 
Bicoid  homeodomain-DNA  structure  by  the 
presence  of  observed  intermolecular  NOEs 
(Table  2),  and  resultant  position  in  the  docked 
structure  (Figure  3). 

While  interaction  between  Y25,  located  in  the 
turn  between  helices  1  and  2,  and  the  DNA 
(Figure  5(a))  is  supported  by  the  presence  of  six 
intermolecular  NOEs  to  the  sugar  protons  of  base 
G16  (ATTAGGG),  the  functional  significance  of  this 
residue  is  debatable.  An  engrailed  homolog  shot¬ 
gun  scanning  study28  showed  that  there  was 
essentially  no  preference  between  tyrosine  and 
phenylalanine  for  position  25  (F:Y,  1.7:1)  and 
concluded  that  the  primary  role  of  an  aromatic 
residue  in  position  25  is  to  form  a  7r-cation 
interaction  with  R53.  However,  this  conclusion  is 
not  necessarily  supported  by  further  analysis  of  the 
available  homeodomain  sequences,  which  reveals 
a  predominance  of  tyrosine  in  position  25  (69%) 
compared  to  other  amino  acids  (lysine  12%, 
arginine  5%,  phenylalanine  1%),  indicating  that 
the  function  of  Y25  in  homeodomains  may  be  a 


Figure  5.  Interactions  between  the  Bicoid  homeodomain  and  the  DNA.  (a)  The  ribbon  diagram  of  the  Bicoid 
homeodomain  (red)  and  the  line  diagram  of  the  DNA  (blue)  are  indicative  of  the  mean  structure  of  the  protein-DNA 
complex.  Six  intermolecular  NOEs  placed  Y25  (magenta)  in  close  proximity  to  the  DNA.  A  cation-rc  interaction  with  R53 
(yellow)  is  seen  in  our  structure,  as  well  as  other  homeodomains.  (b)  Residues  of  the  third  helix  involved  in  DNA 
recognition.  The  side-chains  of  K46  (red),  147  (magenta),  K50  (green),  N51  (blue),  R54  (cyan),  and  K57  (yellow)  are  shown 
in  relation  to  the  protein  backbone  (grey).  The  bases  of  the  recognition  site  are  labeled  (TAATCC/  ATTAGG).  Select  atoms 
of  the  protein  side-chains  of  interest  (for  lysine  the  terminal  HZ1, 2,  and  3  atoms,  for  isoleucine  the  HD1, 2,  and  3  methyl 
protons,  for  N51  the  HD21  and  22  atoms,  and  for  R54  the  terminal  HH11, 12,  22,  and  21  protons)  are  shown  by  spheres 
representing  the  RMSD  of  that  atom  from  the  mean,  indicating  the  variability  of  those  atoms  in  the  ensemble. 
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more  complex  combination  of  hydrogen  bonding 
and  cation-7r  interactions  with  other  amino  acids  of 
the  homeodomain  and  bases  of  the  DNA.85  Another 
appealing,  and  untested,  theory  for  the  prevalence 
of  tyrosine  in  position  25  is  the  possibility  of 
phosphorylation,86  which  may  affect  DNA  binding 
due  to  the  repelling  of  like-negative  charges  of  the 
phosphorylated  tyrosine  and  the  phosphate  back¬ 
bone  of  the  DNA. 

The  most  extensive  region  of  the  homeodomain 
involved  in  binding  to  the  DNA  site  is  the  third 
"recognition"  helix  (Figure  5(b)).  Interaction 
between  three  amino  acids  of  the  recognition  helix 
(147,  K50,  and  R54)  and  the  DNA  were  observed 
experimentally,  providing  19  of  the  33  detected 
intermolecular  NOEs,  and  the  majority  of  infor¬ 
mation  used  to  describe  the  global  homeodomain- 
DNA  docking  orientation  and  local  orientation 
of  these  three  side-chains  in  the  protein-DNA 
interface.  Interactions  between  147  and  the  DNA 
were  hydrophobic  in  nature,  and  involved  close 
contacts  between  the  methyl  groups  of  the  iso¬ 
leucine  side-chain  and  the  methyl  group  of  T8  and 
base/ sugar  protons  of  T8,  A 7,  and  A6  (TAATCC). 
K50  was  involved  in  direct  and  water-mediated 
hydrogen  bonds  to  bases  of  the  sense  and  anti-sense 
strand  of  the  consensus  site  (TAATCC  /ATTAGG), 
and  R54  was  involved  in  direct  and  water-mediated 
hydrogen  bonds  to  the  G18  and  A19  bases  of  the 
anti-sense  strand  of  the  consensus  site  (ATTAGG). 

Crystallographic  studies  of  homeodomain-DNA 
complexes  invariably  indicate  that  the  N51  side- 
chain  forms  a  pair  of  hydrogen  bonds  to  the  second 
adenine  of  the  TAAT  core,  donating  a  hydrogen 
bond  to  the  adenine  N7  and  accepting  a  hydrogen 
bond  from  the  N6.  While  no  intermolecular  NOE 
was  identified  unambiguously  between  N51  and 
the  DNA  in  our  Bicoid  data,  the  global  positioning 
provided  by  the  other  observed  intermolecular 
contacts  places  this  side-chain  in  close  proximity 
to  A6  and  A7  of  the  consensus  site  (TAATCC),  as 
expected  due  to  its  conservation  and  previously 
studied  role  in  DNA  recognition  by  homeo- 
domains.  (A  potential,  weak  intermolecular  NOE 
between  the  N51  side-chain  atom  HD22  and  the  H8 
proton  of  A7  was  observed  but  not  used  in  the 
calculations,  due  to  ambiguity  in  assignment.)  Very 
large,  downfield  chemical  shifts  were  observed  in 
both  the  15N  and  resonance  frequencies  for 
the  terminal  amide  group  of  N51  (spectrum  in 
Supplementary  Data  S7),  which  appears  to  be  a 
conserved  characteristic,  as  similar  shifts  have  been 
observed  in  previous  NMR  studies  of  homeo¬ 
domain-DNA  complexes  (PITX2,36  Antennape- 
dia,87  and  vnd/NK-245).  These  unusually  large 
shifts  are  presumably  caused,  at  least  in  part,  by 
the  positioning  of  the  terminal  amide  group  near 
the  plane  of  an  aromatic  base  (ring  current  shifts) 
and  by  hydrogen  bonding  to  a  DNA  base,  most 
likely  the  second  adenine  of  the  TAAT  core  (TAAT), 
consistent  with  the  conserved  structural  role  of  N51 
during  DNA  recognition.  In  addition  to  the 
characteristic  downfield  chemical  shifts,  substantial 


line-broadening  of  the  N51  side-chain  amide  proton 
resonances  was  observed  in  our  NMR  data, 
indicating  the  presence  of  microsecond-millisecond 
timescale  fluctuations  in  the  magnetic  environment 
of  the  amide  protons.  The  occurrence  of  such  line¬ 
broadening,  which  could  explain  why  no  inter¬ 
molecular  NOE  was  assigned  between  the  side- 
chain  amide  protons  and  the  DNA,  appears  to  be 
another  conserved  characteristic  feature  of  N51  in 
homeodomain-DNA  complexes;  similar  obser¬ 
vations  were  reported  in  the  PITX2,36  Antennape- 
dia,87  and  vnd/NK-2  studies.45  The  origin  of  the 
fluctuations  in  the  magnetic  environment  of  the 
N51  side-chain  protons  is  currently  unknown. 
Billeter  et  al.  suggested  that  the  lack  of  observed 
intermolecular  NOEs  involving  the  N51  side-chain 
and  the  presence  of  line-broadening  could  be  due  to 
a  fluctuating  network  of  weak  interactions  invol¬ 
ving  both  adenine  bases  of  the  TAAT  core  and 
nearby  water  molecules.35  On  the  other  hand, 
Gruschus  et  al.  did  report  the  observation  of 
intermolecular  NOEs  between  the  N51  side-chain 
amide  protons  and  the  DNA,  possibly  indicative  of 
the  more  stable  and  specific  interactions  described 
in  the  crystallographic  studies,  and  suggested  that 
the  line-broadening  could  be  due  to  millisecond 
timescale  motions  of  the  side-chain  and  to  strong 
interactions  with  nearby  water  molecules.40  Based 
on  their  comparative  study  of  the  X-ray  and  NMR 
structures  of  the  Antennapedia  homeodomain- 
DNA  complex,  Fraenkel  and  Pabo  hypothesized 
that  several  types  of  motions  could  lead  to  the 
observed  line-broadening  of  the  N51  side-chain 
resonances,  and  that  such  motions  could  still  be 
consistent  with  the  picture  of  the  specific  hydrogen 
bonding  pattern  seen  in  the  X-ray  data.88  The 
suggested  motions  include  occasional  transitions 
to  conformations  significantly  different  from  that 
observed  in  the  X-ray  structure,  modest  fluctuations 
in  the  y2  angle  of  N51,  and  dynamic  changes  in  the 
solvent  structure  around  the  N51  side-chain.  Our 
Bicoid  data  are  consistent  with  the  model  proposed 
by  Fraenkel  and  Pabo. 

Role  of  K50  and  R54  in  DNA  recognition 

Due  to  the  ongoing  discussion  in  the  literature 
concerning  the  role  of  side-chain  motion  and  water- 
mediated  DNA  recognition  in  the  homeodomain 
family,  and  the  role  of  R54  in  discriminating 
between  consensus  and  non-consensus  DNA  sites 
and  RNA,  we  performed  a  more  detailed  analysis  of 
the  behavior  of  the  K50  and  R54  side-chains.  Our 
analysis  included  the  frequency  and  distribution  of 
direct  and  water-mediated  interactions  between 
these  side-chains  and  specific  bases  of  the  DNA. 
As  indicated  earlier,  four  inter-molecular  NOEs 
between  K50  and  base  G17  and  three  between  R54 
and  bases  G17  and  G18  provided  valuable  infor¬ 
mation  regarding  the  orientation  of  the  recognition 
helix  in  the  major  groove  of  the  DNA.  However, 
because  the  majority  of  these  NOEs  were  relatively 
weak,  and  consequently  assigned  a  loose  upper 
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Table  3.  Direct  and  water-mediated  K50-DNA  hydrogen  bonds 


DNA 

base 

DNA 

atom 

Protein 

residue 

Protein 

atom 

Number 
in  base 

Direct  or  water 
ensemble  specific? 

Mediated 

G17 

06 

K50 

QZ 

1/20 

Y 

Water-mediated 

G18 

06 

K50 

QZ 

1/20 

Y 

Water-mediated 

T8 

04 

K50 

QZ 

2/20 

Y 

Direct 

G18 

06 

K50 

QZ 

5/20 

Y 

Direct 

C9 

04 

K50 

QZ 

5/20 

Y 

Water-mediated 

G17 

N7 

K50 

QZ 

13/20 

Y 

Direct 

limit  of  6  A,  this  allowed  greater  conformational 
flexibility  for  these  side-chains  during  the  docking 
calculation. 

Analysis  of  the  K50  side-chain  reveals  that  20 
direct  and  eight  water-mediated  base-specific 
hydrogen  bonds  are  seen  in  the  20  conformer 
ensemble,  contacting  four  different  bases  of  the 
consensus  site  (5/-TAATCC-3//3/-ATTAGG-5/).  The 
predominant  interaction  between  the  terminal 
amino  protons  of  the  K50  side-chain  and  the  DNA 
are  direct  hydrogen  bonds  to  the  N 7  atom  of  base 
G17  (ATTAGG,  Table  3).  This  is  consistent  with  the 
results  seen  in  both  the  X-ray  structure  of  the 
engrailed  Q50K/DNA  and  the  PITX2/DNA  struc¬ 
tures.  This  interaction  was  seen  in  13  of  20  members 
of  the  ensemble,  followed  in  frequency  by  direct 
hydrogen  bonds  to  the  06  atom  of  G18  (ATTAGG, 
seen  in  five  out  of  the  20  members  of  the  ensemble), 
and  water-mediated  hydrogen  bonds  to  the  04 
atom  of  C9  (TAATCC,  also  five  out  of  20).  Hydrogen 
bonds  were  sometimes  bidentate,  but  were  seen 
also  in  a  variety  of  conformations  involving  a 
greater  number  of  DNA  bases  of  the  consensus 
site  than  is  possible  from  one  or  two  significantly 
populated  conformations,  as  suggested  by  the 
engrailed  Q50K/DNA  crystal  structure  (see 
Supplemental  Data,  Figure  S8).  These  observations 
demonstrate  that  there  are  many  conformations  of 
the  K50  side-chain  that  (a)  satisfy  intramolecular 
protein-protein  NOEs  involving  K50  side-chain 
protons,  (b)  satisfy  the  observed  intermolecular 
protein-DNA  NOEs,  and  (c)  satisfy  the  energetic 
parameters  incorporated  in  the  AMBER  force  field. 


Additional  evidence  for  multiple  conformations 
of  K50  is  provided  by  the  experimentally  observed 
line  broadening  of  the  K50  side-chain  resonances, 
evidence  that  was  seen  also  for  the  side-chain  of  the 
PITX2  K50  residue,  and  for  the  side-chains  in  other 
protein-DNA  complexes,45,87'89'90  and  can  be 
indicative  of  side-chain  motion  on  the  intermediate 
timescale  ( vide  infra).  Line  broadening  of  these 
resonances  could  be  caused  by  many  factors, 
including  protein  side-chain  motion,  DNA  motion, 
and/or  ring  current  effects  from  DNA  bases  or 
nearby  aromatic  residues.  To  address  some  of 
these  possibilities,  the  behavior  of  the  side-chain 
resonances  of  another  lysine  residue  that  contacts 
DNA,  K46,  was  also  examined.  In  Figure  6(a), 
resonances  of  the  K46  and  K50  side-chains,  detected 
through  their  respective  H8  resonance  frequencies, 
are  compared  (a  comparison  made  easier  by  the 
unusual  lack  of  significant  spectral  overlap  for  these 
H8  resonances).  Resonances  of  the  K50  side-chain 
were  very  weakly  detected  in  this  HCCH-total 
correlated  spectroscopy  (TOCSY)  experiment,  as 
well  as  in  the  rest  of  the  NMR  experiments  used 
in  this  study.  The  corresponding  resonances  of  the 
K46  side-chain  are,  in  comparison,  strong  and 
well-resolved.  Line-broadening  on  the  scale  of  that 
observed  for  the  K50  side-chain  was  not  seen  for  the 
DNA  protons,  suggesting  that  the  line  broadening 
was  not  caused  by  DNA  motion.  The  variety  of 
K50  conformations  seen  in  the  ensemble  and  the 
line  broadening  observed  for  the  K50  side-chain 
resonances,  while  not  indisputable  evidence  for  K50 
side-chain  motion,  taken  together,  are  highly 
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Figure  6.  Line  broadening  of  the  K50  and  R54  side-chain  resonances,  (a)  The  top  panel  shows  the  HCCH-TOCSY peaks 
of  the  K46  side-chain  detected  through  the  Hs  resonance  compared  to  the  same  resonances  of  the  K50  side-chain  (bottom 
panel),  (b)  The  top  panel  shows  the  HCCH-TOCSY  peaks  of  the  R54  side-chain  detected  through  the  Hy  resonance 
compared  to  the  same  resonances  of  the  R55  side-chain  (bottom  panel). 
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Table  4.  Direct  and  water-mediated  R54-DNA  hydrogen  bonds 


DNA 

base 

DNA 

atom 

Protein 

AA 

Protein 

atom 

Number 
in  base 

Direct  or  water 
ensemble  specific? 

Mediated 

A19 

H8 

R54 

QH1/2 

1/20 

Y 

Water-mediated 

G18 

N  7 

R54 

QH1/2 

2/20 

Y 

Direct 

A19 

H62 

R54 

QH1/2 

2/20 

Y 

Water-mediated 

G18 

02P 

R54 

QH1/2 

3/20 

N 

Water-mediated 

G18 

02P 

R54 

QH1/2 

5/20 

N 

Direct 

G18 

02P 

R54 

HE 

6/20 

N 

Water-mediated 

A19 

N7 

R54 

QH1/2 

8/20 

Y 

Direct 

G18 

N7 

R54 

QH1/2 

9/20 

Y 

Water-mediated 

A19 

N7 

R54 

QH1/2 

13/20 

Y 

Water-mediated 

suggestive  of  a  dynamic  role  for  K50  during 
recognition  of  DNA.  Indications  for  side-chain 
dynamics  of  K50  were  observed  also  in  molecular 
dynamics  simulations  that  we  performed  (see 
Conclusions). 

The  R54  side-chain,  key  to  differentiation 
between  consensus  and  non-consensus  DNA  and 
RNA-binding  sites,  was  observed  to  make  a  greater 
number  of  hydrogen  bonds  to  the  DNA  than  K50 
(K50:R54, 27:49).  The  N£  group  of  R54  was  shown  to 
make  six  non-specific,  water-mediated  hydrogen 
bonds  to  the  phosphate  backbone  of  DNA  base  G18 
(3LATTAGG-57),  leaving  the  two  terminal  amine 
groups  to  form  the  majority  of  base-specific  contacts 
to  the  DNA.  The  predominant  contact  between  R54 
and  the  DNA  observed  in  the  ensemble  was  to  the 
N7  atom  of  A19  (3'-ATTAGG-5'),  both  through 
direct  (8/20)  and  water-mediated  (13/20)  hydrogen 
bonds  (Table  4;  see  Supplementary  Data  Figure  S9). 
The  second  most  frequently  seen  base-specific 
contact  involved  the  N7  atom  of  G18,  to  which 
direct  (2/20)  and  water-mediated  (9/20)  hydrogen 
bonds  were  observed,  followed  in  frequency 
by  eight  non-specific  contacts  to  the  phosphate 
backbone  of  G18  (directiwater-mediated,  5:3). 
The  frequency  and  position  of  the  R54/DNA 
interactions,  similar  to  the  previously  discussed 
K50  side-chain,  place  the  terminal  amine  groups  in 
multiple  conformations  that  support  the  possibility 
that  this  side-chain  is  in  motion  as  well.  Line 
broadening  for  R54  was  examined  and  compared 
to  resonances  for  R55,  another  amino  acid  in 
close  proximity  to  the  DNA  (Figure  6(b)).  Line 
broadening  for  this  side-chain  is  observed  also, 
although  not  as  dramatically  as  that  seen  for  K50, 
corroborating  the  possibility  of  motion  on  the 
intermediate  timescale. 


Conclusions 

The  results  described  in  this  study  reveal  several 
special  features  about  the  Bed  homeodomain 
structure.  First,  when  compared  with  other  home- 
odomains,  the  Bed  homeodomain  exhibits  a  signifi¬ 
cant  structural  variation  at  the  end  of  the  first  helix. 
Our  analysis  suggests  that  the  unique  glycine 
residue  at  position  23  may  be  responsible  for  this 
variation.  Second,  our  study  reveals  evidence 


indicative  of  molecular  motion  of  the  side-chains 
of  K50,  N51,  and  R54,  a  finding  that  is  supported 
also  by  preliminary  results  of  solvated  molecular 
dynamics  simulations  of  the  Bicoid  homeodomain- 
DNA  complex  (our  unpublished  results).  Impor¬ 
tantly,  both  K50  and  R54  play  critical  roles  in 
recognizing  both  DNA  and  RNA,  as  evidenced  by 
previous  mutation  analyses,23,70,91  and  our  current 
structural  data.  On  the  consensus  TAATCC  DNA 
site,  both  side-chains  make  direct  and  water- 
mediated  contacts  to  bases  in  the  DNA  (ATTAGG 
for  R54,  and  TAATCC /ATTAGG  for  K50).  Among 
all  the  homeodomains.  Bed  is  the  only  one  that 
contains  a  K50/R54  combination  and  is  currently 
the  only  homeodomain  known  to  have  the  ability  to 
recognize  both  DNA  and  RNA.  We  imagine  that 
some  of  the  structural  features  and  conformational 
dynamics  revealed  in  our  current  study  may  play 
important  roles  in  Bed  recognition  of  RNA 
sequences  as  well  as  non-consensus  DNA  sites. 


Materials  and  Methods 

Preparation  of  the  Bicoid  homeodomain  NMR  sample 

The  60  amino  acid  residue  Bicoid  homeodomain  (plus 
seven  residues  C-terminal  to  the  homeodomain,  added 
for  solubility  reasons;  the  sequence  is  shown  in  Sup¬ 
plementary  Data)  was  amplified  from  a  full-length  Bed 
cDNA  plasmid,  pFY441,92  for  insertion  into  the 
pET41a(  +  )  (Novagen)  plasmid,  using  two  primers. 
Primer  A:  S'-GCACGAATTCGAAAACCTGTATTTT- 
CAGGGTCCACGTCGCACCCG-3/,  and  primer  B:  5'- 
CGCGGCAAGCTTTTATTAGGACTGGTCCTTGTGC 
TGATCCG-S't-  Primer  A  contains  an  EcoRI  site  (bold),  a 
21  nucleotide  sequence  encoding  the  seven  amino  acid 
residue  tobacco  etch  virus  (TEV)  protease  cleavage  site 
(underlined),  and  the  first  14  nucleotides  of  the  Bicoid 
homeodomain.  Primer  B  contains  the  last  23  nucleotides 
of  the  Bicoid  homeodomain,  two  stop  codons  (under¬ 
lined),  and  a  Hindlll  cleavage  site  (bold).  Expression, 
purification,  and  preparation  of  the  NMR  sample  were 
accomplished  using  modifications  of  procedures  that 
have  been  described,36  with  the  following  exceptions:  (1) 
after  glutathione-S-transferase  (GST)  resin  purification, 
the  fusion  protein  was  digested  overnight  at  room 
temperature  with  5  mg  of  active  TEV  protease  (pRK793 
plasmid  obtained  from  Dr  David  Waugh  at  the  National 
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Cancer  Institute;93  protein  prepared  in-house);  (2)  the 
resultant  protein  pool  containing  TEV  and  the  Bicoid 
homeodomain  was  applied  to  3  ml  of  equilibrated  His- 
Bind  resin  (Novagen)  to  remove  the  His-tagged  TEV 
protease,  followed  by  an  ion-exchange  purification  using 
2  ml  of  equilibrated  SP-Sepharose  fast  flow  IEX  resin 
(Amersham).  From  1 1  of  doubly  labeled  rich  minimal 
medium,  typical  yields  of  the  Bicoid  homeodomain  were 
between  3  mg  and  5  mg.  Sense  (5/-GCTCTAATCCCG-3/) 
and  anti-sense  (5/-CGGGGATTAGAGC-3/)  strands  of  the 
DNA-binding  sitet  were  dissolved  in  nuclease-free  water, 
mixed  in  an  equimolar  ratio,  heated  to  95  °C  for  15  min, 
and  allowed  to  cool  to  room  temperature.  A  slight  excess 
of  the  annealed  DNA  complex  was  added  to  the  dialysis 
bag  containing  the  purified  Bicoid  homeodomain.  The 
protein/DNA  solution  was  concentrated  by  the  addition 
of  SpectraGel  absorbent  (Spectrum)  to  the  outside  of  the 
dialysis  bag  at  4  °C.  The  absorbent  was  reapplied  until  a 
final  volume  of  300-500  jrl  (0.75-1  mM  complex)  was 
reached.  Sample  was  then  dialyzed  into  10  mM  NaH2P04 
(pH  7.0),  2H20  was  added  to  10%,  along  with  ImM  PMSF, 
1  mM  DTT,  1  mM  EDTA,  0.1  mM  NaN3,  0.3  mM 
leupeptin,  0.2  mM  Pefabloc  (Roche),  and  dissolved 
crushed  protease  inhibitor  tablets  (Roche:  one  tablet 
dissolved  in  3  ml  H20,  1  pi  added  to  540  pi  sample). 
Samples  were  placed  into  Shigemi  NMR  tubes  and  stored 
at  4  °C. 


NMR  spectroscopy  and  structure  calculation 


All  experiments  were  performed  with  600  MHz  and 
800  MHz  Varian  Inova  spectrometers  at  295  K  and  proton 
chemical  shifts  were  referenced  against  an  external  DSS 
standard.  Data  were  processed  using  the  programs 
NMRDraw/NMRPipe.  Spectra  were  analyzed  and 
assigned  using  the  program  SPARRYf.  The  pulse 
programming  codes  were  written  in-house.  Protein  4H, 
15N,  and  13C  resonance  assignments  (>95%  complete, 
listed  in  Supplementary  Data  S10)  were  made  using  the 
following  experiments:  2D  15N  heteronuclear  single 
quantum  coherence  (HSQC),  3D  HNCA,  HNCO,  CBCA 
(CO)NH,  HN(CO)CA,  and  HNCACB,95-99  2D  hetero¬ 
nuclear  multiple  quantum  coherence  (HMQC)  and  2D 
HMQC-TOCSY,109,101  2D  13C-HSQC,  HBHA(CBCA- 


CO)NH,  H(CCO)NH-TOCSY,  HCCH-TOCSY,  TOCSY- 
HSQC,  four  3D-NOESY  experiments:  three  15N-separated 
NOESY  (t  =  50  ms,  80  ms,  125  ms  mixing  times)102  and 
one  13C-separated  NOESY  (T  =  150ms).  Backbone  cp 
dihedral  angles  were  obtained  by  analysis  of  a  3D  HNHA 
spectrum.10  Unlabeled  DNA  proton  resonances  were 
assigned  using  three  13C/15N-filtered  2D-NOESY  experi¬ 
ments  (t  =  60  ms,  120  ms,  300  ms  mixing  times)62  and  two 
co2-filtered  2D  TOCSY  experiments  (t  =  42, 80  ms)104  using 
a  standard  assignment  strategy78  (see  chemical  shift  list  in 
Supplementary  Data  Sll).  Distance  restraints  were 
obtained  from  NOEs  as  described.105  NOEs  between  the 
Bicoid  homeodomain  and  the  DNA  were  identified  in  a 
2D  13C(w1)-edited,  [13C,  15N](w2)-filtered  NOESY  spec¬ 
trum.104,106,107  The  unassigned,  integrated  peak  lists  from 
two  3D-15N  NOESY  (t=50  ms,  80  ms)  spectra  and  one 
3D-13C  NOESY  spectrum  were  both  assigned  and  used  by 
the  program  CYANA2.0  to  calculate  the  structure  of  the 
homeodomain.75,108  The  20  structures  with  the  lowest 
target  function  (potential  energy)  were  then  used  to 
calculate  the  structure  of  the  Bicoid  homeodomain-DNA 
complex  using  the  program  AMBER7.0.80  The  protein 


was  docked  onto  the  DNA  structure  using  a  modified 
version  of  a  previously  described  protocol.105  The  20 
docked  structures  with  the  lowest  energies  and  restraint 
violations  were  subjected  to  an  additional  30  ps  of 
conjugate-gradient  energy  minimization  with  solvent 
included  using  the  SANDER  module  of  AMBER  7. 

Homeodomain  comparison 

The  Bed  homeodomain  Ca  backbone  structure  (resi¬ 
dues  8-55)  was  aligned  with  15  other  homeodomain 
structures  (PDB  IDs:  1YZ8  (PITX2),36  1AHD  (Antenna- 
pedia),35  1FTT  (thyroid  transcription  factor-1),38  1NK3 
(Vnd/NK-2),40  1BW5  (Isl-1),41  1FTZ  (fushi  tarazu),42 
1CQT  (POU),48  1ENH  (engrailed),  1JGG  (Even- 
skipped),52  1IG7  (Msx-1),53  1AU7  (Pit-1),54  1B8I  (Ultra¬ 
bithorax),56  1B72  (HoxB-1),57  2HDD  (engrailed  Q50K),58 
and  1FJL  (Paired)59)  using  the  MAMMOTH  server§.84 
This  list  includes  seven  NMR35,36,38'40-42  and  nine 
X-ray48,49,52-54,56,58,59  structures,  12  solved  with  DNA 
present  and  four  without  DNA.38,41,42,49 

Data  bank  accession  codes 

The  atomic  coordinates  have  been  deposited  in  the 
RCSB  Protein  Data  Bank  with  accession  code  1ZQ3.  The 
protein  chemical  shift  assignments  have  been  submitted 
to  the  BMRB  with  accession  code  6906. 
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Gene  transcription  can  be  activated  or  repressed.  Such 
seemingly  simple  decisions  reflect  the  coordinated 
actions  of  a  wide  array  of  proteins.  Activators  and  co¬ 
activators  work  together  to  stimulate  the  assembly  and 
activity  of  the  machinery  that  transcribes  the  gene, 
whereas  repressors  and  co-repressors  work  to  achieve 
the  opposite  goal.  Recent  studies  show  that  many 
proteins  often  engage  in  regulatory  activities  and 
interactions  that  cross  the  activation-repression  divide. 
This  article  discusses  selected  examples  to  illustrate  the 
dynamic  nature  of  the  transcriptional  regulation  process 
and  highlights  the  important  roles  of  not  only  the 
individual  proteins  but  also  their  communication 
system. 

Transcription  is  the  first  step  in  expressing  the  genetic 
information  of  a  cell.  It  is  a  highly  regulated  process  that 
requires  the  coordinated  actions  of  many  different  pro¬ 
teins  [1,2].  For  RNA  polymerase  II  transcription,  such 
factors  can  be  loosely  categorized  into  three  broad  groups, 
(i)  General  transcription  factors  (GTFs):  these  proteins, 
together  with  the  RNA  polymerase  (RNAP),  assemble  into 
the  transcription  machinery  at  promoters.  Although  the 
majority  of  GTFs  do  not  recognize  specific  DNA  sequences, 
the  TATA-binding  protein  (TBP)  can  bind  to  the  TATA  box 
in  a  promoter  directly;  other  promoter  elements  such  as 
the  initiator  element  (INR)  can  provide  additional  speci¬ 
ficity,  particularly  for  TATA-less  promoters,  (ii)  DNA- 
binding  regulatory  proteins,  which  are  simply  referred  to 
as  transcription  factors  (TFs)  in  this  article.  These 
proteins  specifically  bind  to  regulatory  DNA  sequences 
near  -  or  sometimes  distant  from  -  gene  promoters.  These 
regulatory  sequences  include  all  DNA  elements  that  can 
influence  gene  transcription  such  as  enhancers,  silencers, 
upstream  activating  sequences  (UASs)  and  upstream 
repressing  sequences  (URSs).  Regulatory  elements  tend 
to  contain  arrays  of  binding  sites  for  TFs  and  the  specific 
arrangements  of  these  sites  can  affect  how  the  factors 
work  with  one  another,  (iii)  Co-factors:  these  proteins 
generally  do  not  bind  to  specific  DNA  sequences  them¬ 
selves  but  can  interact  with  DNA-bound  TFs.  They 
facilitate  and  coordinate  the  actions  of  TFs,  often  by 
bridging  them  to  the  transcription  machinery  or  by 
altering  the  local  chromatin  structure.  For  example,  the 
histone  acetyltransferase  (HAT)  co-activators  and  the 
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histone  deacetylase  (HDAC)  co-repressors  can  increase 
and  decrease  the  accessibility  of  DNA  to  TFs  and  GTFs, 
respectively,  by  altering  the  acetylation  status  of  the 
histone  tails  [3-6].  The  Swi-Snf  chromatin  remodeling 
complexes  have  positive  roles  in  transcription  by  facilitat¬ 
ing  TFs  and  GTFs  to  access  nucleosomal  DNA  [4,7-9]. 

Proteins  from  all  three  groups  work  together  to  ensure 
proper  transcription  levels  of  the  genes  inside  a  cell.  For  a 
given  gene,  the  outcomes  of  the  actions  of  these  proteins 
are  rather  simple,  with  its  expression  level  either 
increased  (activated)  or  decreased  (repressed).  However, 
many  TFs  can  work  as  both  activators  and  repressors 
depending  on  cellular  or  promoter  contexts.  In  addition, 
co-factors  that  are  generally  viewed  as  co-activators 
(e.g.  the  Swi-Snf  complexes)  or  co-repressors 
(e.g.  HDACs)  can  have  roles  that  contradict  their  stereo¬ 
typic  designations.  Furthermore,  some  GTFs  have  been 
found  to  be  associated  with  co-repressor  proteins  to 
mediate  gene  silencing,  rather  than  mediating  transcrip¬ 
tional  activation.  This  article  discusses  the  dynamic 
nature  of  transcription  control  by  focusing  specifically  on 
factors  that  participate  in  two  opposite  courses  of 
regulation:  activation  and  repression  (Figure  1). 
Examples  of  TFs  that  have  dual  activator-repressor 
functions  will  be  discussed,  followed  by  cases  of  co-factors 
and  GTFs  that  are  engaged  in  distinct  interactions  that 
lead  to  activation  and  repression. 

Activator-repressor  switches  depending  on  promoter 
and  cellular  contexts 

Although  there  are  bacterial  TFs  that  can  work  both  as 
activators  and  repressors  depending  on  the  context  [10], 
this  review  will  be  limited  to  eukaryotic  proteins.  One 
well-known  mammalian  protein  is  Yin  Yang  1  (YY1),  a 
ubiquitously  expressed  zinc-finger  TF  [11,12].  YY1  has  a 
regulatory  role  for  many  target  genes,  acting  either 
positively  or  negatively  depending  on  the  promoter 
context  and  availability  of  other  proteins.  When  the 
YYl-binding  site  overlaps  with  an  activator-binding  site, 
it  can  act  as  a  repressor  by  competing  with  activator 
binding.  YY1  can  also  bind  to  INR  of  many  genes 
contributing  positively  to  transcription.  In  vitro  transcrip¬ 
tion  experiments  have  shown  that  YY1,  transcription 
factor  IIB  (TFIIB,  a  GTF)  and  RNAP  are  sufficient  to 
support  basal  transcription  from  a  TATA-less  promoter 
[13].  The  availability  of  other  factors  such  as  the 
adenovirus  E1A  protein  can  also  affect  the  regulatory 
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Figure  1.  Functional  switches  and  interactions  in  transcription  control.  Three  broad 
groups  of  proteins -and  their  interactions  -  are  involved  in  transcription  regulation: 
transcription  factors  (activators  and  repressors),  co-factors  (co-activators  and  co¬ 
repressors)  and  general  transcription  factors  (GTFs).  The  two  sides  of  the  diagram 
represent  the  two  opposite  sides  of  the  regulation  process:  activation  (left;  red 
exemplifies  the  firing  of  the  transcription  machinery)  and  repression  (right).  Wide 
arrows  indicate  the  functional  switches  of  factors  that  cross  the  activation- 
repression  line.  Thin  arrows  represent  possible  functional  interactions,  depicting 
not  only  those  in  activation  (red)  or  repression  (blue)  but  also  those  that  cross  the 
activation-repression  divide  (two-coloured  arrows).  The  interactions  among  the 
proteins  within  the  same  groups  are  not  shown  (e.g.  activator-activator  inter¬ 
actions,  repressor-repressor  interactions). 

functions  of  YY1.  It  is  thought  that  E1A  can  convert  YY1 
from  a  repressor  to  an  activator  by  exposing  a  concealed 
activating  function  of  YY1.  Furthermore,  YY1  can  interact 
with  co-factors  such  as  the  HAT  co-activators  [e.g.  the 
highly  related  proteins  p300  and  CREB-binding  protein 
(CBP)],  HDAC  co-repressors  and  a  histone  methyltrans- 
ferase  [14].  These  interactions  can  further  influence 
whether  YY1  activates  or  represses  transcription.  A 
recent  study  demonstrates  that  YY1  can  functionally 
compensate  for  the  loss  of  the  Drosophila  melanogaster 
Polycomb  group  (PcG)  protein  Pleiohomeotic  in  mutant 
flies  [15].  Because  PcG  proteins  exist  and  function  in  large 
co-repressor  complexes  (discussed  in  the  following  sec¬ 
tion),  this  finding  further  highlights  the  roles  of  co-factors 
in  executing  the  regulatory  functions  of  YY1  in  vivo. 
Interestingly,  a  new  study  suggests  that  YY 1  is  a  negative 
regulator  of  p53  (discussed  in  the  next  section);  however, 
such  regulation  appears  to  be  independent  of  the  tran¬ 
scriptional  activity  of  YY1  [16]. 

An  essential  function  of  p53  -  a  tumour  suppressor 
protein  that  can  bind  to  DNA  -  is  to  activate  genes 
involved  in  such  essential  processes  as  cell-cycle  control, 
apoptosis  and  angiogenesis  [17,18].  The  following  genes 
have  been  shown  to  be  direct  targets  of  p53:  p21 
(an  inhibitor  of  cyclin-dependent  kinases),  Bax  (a  pro- 
apoptotic  factor)  and  thrombospondin- 1  ( TSP-1 ,  an  inhibi¬ 
tor  of  angiogenesis).  It  also  represses  many  other  genes 
through  mechanisms  that  can  be  either  dependent  or 
independent  of  p53-binding  sites.  Similar  to  YY1,  p53  can 
act  as  a  repressor  by  competing  with  activators  for  DNA 
binding  or  by  blocking  the  functions  of  DNA-bound 
activators.  It  can  also  repress  transcription  by  interacting 


with  GTFs  or  by  recruiting  co-repressors  such  the 
Sin3A-HDAC  complex.  It  is  currently  not  fully  under¬ 
stood  how  p53  chooses  to  activate  or  repress  transcrip¬ 
tion  but  promoter  contexts,  such  as  the  p53-binding 
site  characteristics  [19,20]  and  the  relative  locations  of 
the  binding  sites  for  p53  and  other  factors  [21],  are  likely 
to  influence  the  decision-making  process. 

Although  there  is  a  long  list  of  TFs  that  possess  dual 
activator-repressor  functions,  two  Drosophila  proteins  are 
noteworthy  because  their  functions,  like  many  other 
proteins  in  early  embryogenesis,  are  dependent  on  their 
concentrations.  For  example,  the  Hunchback  protein  (Hb) 
positively  auto-regulates  its  own  expression  and  activates 
even-skipped  (eve)  stripe  2  expression  in  early  developing 
embryos  [22,23].  However,  Hb  represses  the  expression  of 
other  target  genes  such  as  knirps  ( kni )  and  several  other 
eve  stripes  in  regions  of  the  embryos  where  Hb  is  at  lower 
concentrations  [24] .  What  makes  Hb  work  as  an  activator 
or  a  repressor  in  different  contexts  is  currently  not  well 
understood  but  enhancer  structure,  Hb  concentration  and 
interactions  with  other  proteins  including  GTFs  and  co¬ 
factors  [25,26]  are  likely  to  influence  such  decisions. 
Another  well-documented  example  is  the  zinc-finger 
protein  Kruppel  (Kr),  which  can  activate  or  repress 
transcription  from  the  same  DNA-binding  sites  depending 
on  its  concentration  [27] .  At  low  concentrations  Kr  works 
as  an  activator,  whereas  at  high  concentrations  it  works  as 
a  repressor.  It  has  been  suggested  that  the  monomeric  and 
dimeric  forms  of  Kr  can  interact  with  the  GTFs,  TFIIB  and 
transcription  factor  HE  (TFIIE),  respectively,  either  to 
activate  or  repress  transcription  [28] . 

Signal-dependent  switches  between  activators  and 
repressors 

Many  TFs  can  have  distinctive  activator  or  repressor  roles 
in  a  signal-dependent  manner.  Members  of  the  nuclear 
receptor  superfamily  represent  excellent  examples  for 
signal-induced  functional  switches  [29-32].  These  are 
zinc-finger  TFs  that  respond  to  ligands,  for  example, 
retinoids,  steroids  and  thyroid  hormones;  some  members 
of  this  family  are  called  orphan  receptors  because  their 
ligands  are  unknown.  Nuclear  receptor  superfamily 
members  have  important  roles  in  many  biological  pro¬ 
cesses  such  as  differentiation,  proliferation  and  cell  death. 
When  they  are  present  with  their  ligands  in  a  complex, 
these  proteins  activate  transcription  by  recruiting  the 
HAT  co-activators  [e.g.  p300-CBP  and  p300-CBP-associ- 
ated  factor  (PCAF)]  and  chromatin  remodeling  complexes 
to  target  gene  promoters.  However,  in  the  absence  of 
ligands  or  in  the  presence  of  antagonists,  many  of 
these  proteins  function  as  repressors  by  recruiting  the 
Sin3A-HDAC  complex  through  the  nuclear  receptor  co¬ 
repressor  (NCoR)  and  silencing  mediator  for  retinoid 
and  thyroid  receptors  (SMRT).  Structural  studies  have 
revealed  that  ligands  and  antagonists  can  result  in 
distinct  conformations  of  the  receptor  proteins,  thus, 
affecting  their  abilities  to  interact  with  co-activators  or 
co-repressors  [33-35]. 

Several  other  signaling  pathways  also  involve  ligand- 
induced  functional  switches  of  TFs  [36-40] .  For  example, 
the  canonical  Wnt  signaling  pathway  is  dependent  on  the 
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high  mobility  group  (HMG)  proteins  T-cell  factor  (TCF)  or 
lymphocyte  enhancer-binding  factor  (LEF).  In  the  absence 
of  Wnt  signaling,  TCF  binds  to  the  Wnt-response  elements 
and  represses  transcription  by  recruiting  co-repressor 
proteins  such  as  Groucho  (Gro)  and  C-terminal-binding 
protein  (CtBP).  Wnt  signaling  leads  to  the  nuclear 
accumulation  of  the  co-activator  (3-catenin,  which  inter¬ 
acts  with  DNA-bound  TCF,  recruits  co-activators  such  as 
CBP  and  activates  transcription.  Similarly,  in  the  absence 
of  Notch  signaling,  the  DNA-binding  protein  Drosophila 
Suppressor  of  Hairless  or  mammalian  C-promoter-bind- 
ing  factor  1  [(Su(H)-CBF-l]  represses  transcription  by 
interacting  with  co-repressors  such  as  CtBP  and  Gro  in 
Drosophila  and  the  NcoR-SMRT  complexes  in  mammals. 
Ligand-induced  activation  of  the  membrane-bound  Notch 
protein  leads  to  the  release  of  the  intracellular  domain  of 
Notch  (NIC),  which  then  enters  the  nucleus,  interacts  with 
DNA-bound  Su(H)-CBF-l,  recruits  co-activators  such  as 
CBP  and  PCAF  and  activates  transcription.  In  both  cases, 
the  DNA-binding  TFs  [LEF-TCF  in  Wnt  signaling  and 
Su(H)-CBF-l  in  Notch  signaling]  switch  from  repressors 
to  activators  in  a  ligand-dependent  manner  (Figure  2).  In 
Hedgehog  (Hh)  signaling,  the  zinc-finger  TF  Cubitus 
interruptus  (Ci)  can  enter  the  nucleus  either  as  a 
truncated  protein  to  repress  transcription  (in  the  absence 
of  Hh)  or  as  a  full-length  protein  to  activate  transcription 
(in  the  presence  of  Hh).  (For  a  discussion  of  TFs  in  other 
signaling  pathways,  see  Barolo  and  Posakony  [36].) 

Regulation  of  activator  functions  depending  on 
enhancer  structures 

Many  transcriptional  activators  have  been  shown  to 
possess  domains  that  can  inhibit  their  own  ability  to 
activate  transcription  (see  examples  cited  in  Ref.  [41]). 
The  actions  of  these  self-inhibitory  domains  can  range 
from  cytoplasmic  sequestration,  DNA-binding  inhibition 
to  activating  surface  concealment.  Another  mechanism 
that  regulates  the  function  of  an  activator  involves 
interactions  with  co-repressors.  As  discussed  earlier,  in 
extreme  cases  such  interactions  can  completely  switch  the 
function  of  a  TF  from  an  activator  to  a  repressor.  The 
Drosophila  morphogenetic  protein  Bicoid  (Bed),  a  home- 
odomain-containing  TF,  instructs  embryonic  patterning 
by  stimulating  the  expression  of  specific  genes  in  a 


Figure  2.  Signal-induced  functional  switches.  A  schematic  representation  of  Wnt 
and  Notch  signaling  events  in  the  nucleus  is  shown,  (a)  In  the  absence  of  the  signal, 
the  transcription  factor  (TF),  lymphocyte  enhancer-binding  factor  (LEF)  or  T-cell 
factor  (TCF)  in  Wnt  signaling  and  Drosophila  Suppressor  of  Hairless  or  mammalian 
C-promoter-binding  factor  1  [Su(H)-CBF-1]  in  Notch  signaling,  binds  to  the  signal 
response  element  (RE),  recruits  co-repressors  and  represses  transcription,  (b)  In  the 
presence  of  the  signal,  the  signal-induced  co-activator  (S-CA),  (3-catenin  in  Wnt 
signaling  and  NIC  in  Notch  signaling,  enters  the  nucleus,  interacts  with  the  TF, 
recruits  additional  co-activators  and  activates  transcription. 


concentration-dependent  manner  [42-44].  Although  Bed 
is  not  known  to  undergo  activator-repressor  switches,  the 
regulation  of  its  activity  helps  to  illustrate  the  importance 
of  the  enhancer  structure  in  selectively  presenting  the 
surfaces  of  a  TF  to  enable  interaction  with  other  proteins. 
Recent  experiments  suggest  that  an  N-terminal-located 
self-inhibitory  domain  of  Bed  can  interact  with  the  co¬ 
repressor  Sin3A  [41,45];  Bed  can  also  interact  with  SAP18, 
a  component  of  the  Sin3A-HDAC  complex  [46] .  Interest¬ 
ingly,  the  N-terminal  domain  of  Bed  also  has  a  crucial  role 
in  cooperative  DNA  binding  to  certain  enhancers  [47] .  It  is 
proposed  that,  depending  on  the  arrangements  of  Bed¬ 
binding  sites  in  an  enhancer,  this  dual-purpose 
N-terminal  domain  of  Bed  is  preferentially  used  for  either 
cooperative  DNA  binding  or  self-inhibition  [47] .  This  can 
facilitate  the  protein  to  exert  its  regulatory  activities  in  a 
manner  that  is  dependent  not  only  on  its  own  concen¬ 
tration  but  also  on  the  enhancer  structure.  Recent  studies 
further  suggest  that  the  co-activator  CBP  can  interact 
with  Bed  and  increase  its  activity  also  in  a  concentration- 
and  enhancer-dependent  manner  [48] . 

Co-factors  with  dual  roles  in  transcription 

Historically,  co-factors  are  loosely  divided  into  co- activa¬ 
tors  and  co-repressors  (Figure  1)  but  their  distinctions  are 
becoming  blurred.  For  example,  CBP  and  p300  are 
generally  viewed  as  co-activators  [49,50]  but  in  some 
cases  they  can  also  function  as  co-repressors  [21,51-53]. 
Genome-wide  microarray  studies  have  further  revealed 
that  co-activators  and  co-repressors  can  often  have  roles  - 
in  some  cases  direct  roles  -  that  are  opposite  to  their 
designations.  For  example,  the  Swi-Snf  chromatin  remo¬ 
deling  complexes  are  generally  considered  to  have  positive 
roles  in  transcription  [4,7-9].  Indeed,  microarray  data  in 
yeast  show  that  mutations  affecting  components  in  the  yeast 
complex  reduce  the  transcription  of  many  genes  [54,55]. 
However,  many  other  genes  are  expressed  at  increased 
levels  in  swi-snf  mutant  cells,  suggesting  that  the 
Swi-Snf  complex  can  also  have  negative  roles  in  tran¬ 
scription.  Martens  and  Winston  studied  SER3  (a  serine 
biosynthesis  gene),  one  of  the  genes  repressed  by  Swi-Snf 
[56].  They  showed  that  the  Swi-Snf  complex  is  directly 
recruited  to  the  SER3  promoter  in  yeast.  Interestingly, 
unlike  transcriptional  activation,  which  is  dependent  on 
most  of  the  Swi-Snf  subunits,  only  Snf2  is  required  for 
SER3  repression.  Precisely  how  Snf2  represses  transcrip¬ 
tion  is  unknown,  but  it  might  involve  interactions  with 
co-repressors.  It  has  been  shown  biochemically  that 
ATP-dependent  chromatin  remodeling  proteins  can  form 
complexes  with  co-repressors  such  as  HDACs  [57-60]. 
Interestingly,  SER3  transcription  is  also  affected  by  the 
transcription  of  a  non-coding  gene  (SER3  regulatory  gene 
1  or  SRG1 )  located  upstream  of  SER3,  but  in  a  manner 
that  appears  to  be  independent  of  Snf2  [61]. 

Sin3  and  HDAC  proteins  are  generally  viewed  as  co¬ 
repressors  because  they  restrict  the  DNA  accessibility  to 
TFs  and  GTFs  [3-6].  However,  microarray  experiments 
show  that,  although  the  expression  of  many  genes  is 
upregulated  in  yeast  cells  that  contain  either  a 
SWI-independent  3  ( sin3 )  or  a  rpd3  mutant  (which 
encodes  a  yeast  class  I  HDAC),  several  genes  are  actually 
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expressed  at  reduced  levels  [62,63].  These  results  suggest 
that  Sin3  and  Rpd3  can  also  have  positive  roles  in 
transcription.  Kinetic  experiments  using  the  HDAC 
inhibitor  trichostatin  A  (TSA)  further  provide  evidence 
of  a  direct  role  of  Rpd3  in  the  transcriptional  activation  of 
some  genes  [62].  In  a  study  by  Wang  et  al.  [64],  it  was 
shown  that  another  class  I  HDAC  in  yeast,  encoded  by 
Hos2 ,  is  required  specifically  for  transcriptional  acti¬ 
vation.  Hos2  is  preferentially  associated  with  actively 
transcribed  genes  in  the  genome.  Interestingly,  unlike 
Rpd3,  which  can  deacetylate  most  lysines  in  the  tails  of  all 
four  histones,  Hos2  exhibits  a  strong  preference  for  lysines 
in  histones  H3  and  H4.  These  results  show  that  all  histone 
acetylation  events  are  not  equal  in  their  roles  in  regulat¬ 
ing  transcription.  In  addition  to  acetylation,  histones  are 
also  subject  to  other  types  of  modifications  including 
methylation,  phosphorylation  and  ubiquitination,  and  all 
of  these  modifications  can  work  in  combination  to  achieve 
distinctive  regulatory  outcomes  [65,66]. 

A  "transcriptional  clock'  containing  co-repressor 
proteins 

In  an  elegant  study  reported  recently,  Metivier  et  al.  [67] 
used  the  chromatin  immunoprecipitation  (ChIP)  tech¬ 
nique  to  determine  the  occupancy  of  a  wide  panel  of  GTFs 
and  co-factors  at  an  oestrogen-inducible  promoter.  It  was 
revealed  that  these  factors  join  and  depart  from  the 
transcription  complexes  formed  at  the  promoter  in  a  cyclic 
manner.  This  sequential  and  combinatorial  assembly  of 
transcription  complexes  at  the  promoter  was  referred  to  as 
a  Transcriptional  clock’,  with  each  cycle  lasting  ~40 
minutes.  Surprisingly,  co-repressor  proteins  HDAC1  and 
HDAC 7  are  also  integral  parts  of  this  transcriptional 
clock.  The  loading  and  exiting  of  these  proteins,  together 
with  the  Swi-Snf  chromatin-remodeling  complex,  appear 
to  mark  the  end  of  each  cycle  and  prepare  the  promoter  for 
the  next  cycle.  Although  it  is  currently  unknown  whether 
transcriptional  activation  at  other  promoters  requires  the 
same  type  of  waves  of  complex  formation,  the  findings 
described  in  this  study  clearly  illustrate  that  proteins  that 
are  generally  viewed  as  co-repressors  (such  as  HDACs) 
actually  have  integral  roles  during  the  transcription- 
activation  process. 

General  transcription  factors  mediating  gene  silencing 

According  to  a  well-established  recruitment  model,  tran¬ 
scriptional  activation  is  a  process  of  bringing,  ultimately, 
the  transcription  machinery  to  a  promoter  [68,69]. 
However,  recent  studies  in  yeast  and  Drosophila  suggest 
that  gene  silencing  can  act  following  the  assembly  of  the 
transcription  machinery.  Cell-type  specification  in  the 
bakers  yeast  Saccharomyces  cerevisiae  is  controlled  by 
gene  products  encoded  by  the  MAT  locus.  In  addition  to 
this  active  locus,  there  are  two  donor  cassettes  ( HMR  and 
HMD)  that  are  silenced  by  cis-elements  called  silencers. 
Several  proteins,  including  repressor  activator  protein  1 
(RAP1),  ARS-binding  factor  1  (ABF1)  and  origin  replica¬ 
tion  complex  (ORC),  bind  to  DNA  sequences  in  the 
silencers  and  recruit  the  silencing  information  regulator 
(SIR)  complex  containing  Sir2,  Sir3  and  Sir4  [70,71].  Sir2 
is  a  nicotinamide  adenine  dinucleotide  (NAD)-dependent 


HDAC  (class  III),  whereas  Sir3  and  Sir4  (which  are  not 
HDACs)  can  interact  with  the  hypoacetylated  N-terminal 
tails  of  histones  H3  and  H4  to  enable  the  complex  to 
propagate  along  nucleosomes.  Because  silenced  chromatin 
is  refractory  to  sequence-specific  DNA-binding  proteins 
such  as  restriction  enzymes  [72],  it  was  thought,  according 
to  one  model,  that  SIR-dependent  hypoacetylation  of 
chromatin  at  the  silent  cassettes  prevented  activators 
and  GTFs  from  accessing  DNA.  However,  a  recent  report 
argued  against  this  simple  idea  [73].  Using  ChIP  assays, 
Sekinger  and  Gross  showed  that  the  SIR-silenced  Hsp82 
promoter  is  actually  occupied  by  not  only  the  activator 
heat  shock  factor  (HSF)  but  also  components  of  the 
transcription  machinery  such  as  TBP  and  RNAP.  Both 
TBP  and  RNAP  are  also  present  at  the  promoter  of 
HMRal,  a  natural  target  gene  silenced  by  the  SIR 
complex.  Precisely  how  the  assembled  transcription 
machinery  remains  inactive  at  the  SIR-silenced  genes  is 
not  well  understood  but,  as  discussed  in  the  next  section, 
one  possible  mechanism  might  involve  direct  interactions 
between  co-repressors  and  GTFs. 

Studies  of  homeotic  gene  regulation  in  Drosophila  have 
revealed  specific  interactions  between  co-repressors  and 
GTFs  [74,75].  Homeotic  genes  themselves  encode  TFs  that 
control  segment  identity.  In  early  embryogenesis,  the 
expression  of  these  genes  is  initiated  by  DNA-binding  TFs 
such  as  those  encoded  by  gap  and  pair-rule  genes.  The 
expression  profiles  of  the  homeotic  genes  are  subsequently 
controlled  by  the  trithorax  group  ( trxG )  and  PcG  genes, 
which  maintain  their  active  and  silenced  states,  respect¬ 
ively  (Figure  3).  TrxG  proteins  can  form  chromatin 
remodeling  complexes  related  to  the  yeast  Swi-Snf 
complex  [76-78],  whereas  PcG  proteins  assemble  into 
large  complexes  that  can  contain  co-repressors  such  as 
Sin3A  and  HDAC1  [79,80].  Paradoxically,  PcG  and  trxG 
proteins  often  co-localize  on  DNA,  suggesting  that  these 
proteins  with  opposite  functions  can  work  in  concert  to 
regulate  transcription  [81,82].  More  intriguingly,  a  PcG 
repressive  complex  (PRC1)  actually  contains  several  GTFs 
such  as  TBP  and  TBP-associated  factors  (TAFs)  [80].  In 
addition,  ChIP  experiments  show  that  many  PcG-silenced 
promoters  are  occupied  by  components  of  the  transcription 
machinery  including  TBP,  TFIIB  and  RNAP  [83,84]. 
Together,  these  studies  of  both  SIR-mediated  silencing 
in  yeast  and  PcG-mediated  repression  in  Drosophila 
suggest  that  one  mechanism  of  gene  silencing  is  to 
keep  the  pre-assembled  transcription  machinery  inac¬ 
tive,  probably  through  interactions  between  GTFs  and 
co-repressors  (Figure  3).  This  repression  mechanism  is 
distinct  from  the  mechanism  where  repressors  interact 
with  GTFs  to  prevent  the  assembly  of  the  transcription 
machinery  [28,85]. 

Three  words  important  in  transcription  control:  context, 
context,  context 

It  is  evident  that  regulation  of  gene  expression  is  a  highly 
dynamic  process  requiring  the  actions  of  -  and  communi¬ 
cations  between  -  many  proteins.  Although  these  proteins 
help  make  unambiguous  and  simple  decisions  to  either 
activate  or  repress  a  gene,  their  roles  and  interactions 
often  cross  the  activation-repression  line.  Whether  a  TF 
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Figure  3.  Homeotic  gene  expression.  A  schematic  representation  depicts  the  roles  of  trithorax  group  (trxG)  co-activators  and  Polycomb  group  (PcG)  co-repressors  in 
maintaining  (a)  the  active  and  (b)  inactive  expression  states  of  a  homeotic  gene,  respectively.  Note  that  in  both  cases  that  the  general  transcription  factors  (GTFs),  including 
TATA-binding  protein  (TBP)  and  RNA  polymerase  (RNAP),  are  present  at  the  promoter.  Both  trxG  and  PcG  proteins  can  exert  their  effects  through  Polycomb  response 
elements  (PREs),  by  interacting  with  proteins  (not  shown)  that  recognize  specific  DNA  sequences  in  these  elements  or  by  recognizing  epigenetic  marks  (e.g.  methylated 
histones)  at  these  locations  [74,75]. 


activates  or  represses  transcription  of  a  given  gene  is 
determined  by  the  specific  microenvironment  in  which  it 
operates,  including  such  parameters  as:  (i)  its  own 
concentration  and  physical  form  (e.g.  ligand  interaction 
or  modifications  including  ubiquitination  and  SUMOyla- 
tion  [86]);  (ii)  how  it  interacts  with  other  TFs  on  DNA 
(e.g.  competitive  or  cooperative  DNA  binding);  and  (iii)  the 
types  of  surfaces  (e.g.  activation  versus  repression)  that 
are  available  for  interacting  with  co-factors  and/or  GTFs 
upon  landing’  on  an  enhancer.  The  availability  and 
relative  concentrations  of  specific  co-activators  and  co¬ 
repressors,  and  the  chromatin  architecture  and  initial 
expression  status  of  a  gene  are  also  important  variables 
that  can  influence  the  ability  of  a  TF  to  activate  or  repress  a 
gene.  Similarly,  the  ability  of  a  co-factor  to  activate  or 
repress  transcription  is  also  influenced  by  the  micro¬ 
environment  in  which  it  works  and,  as  discussed  previously, 
its  role  can  be  time-dependent.  Even  GTFs  can  interact  with 
co-repressors  to  mediate  gene  silencing,  in  addition  to  their 
roles  of  interacting  with  activators  and  co-activators  to 
mediate  gene  activation.  It  is  clear  that  proteins  that  control 
transcription  have  multiple  ‘personalities’:  their  roles 
depend  on  when  and  where  they  are  in  the  transcription 
process  and  on  the  proteins  that  are  around  them. 
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Summary 

Transcriptional  activators  are  required  to  turn  on 
the  expression  of  genes  in  a  eukaryotic  eel!.  Activators 
bound  to  enhancers  stimulate  the  assembly  and  activity 
of  the  transcription  machinery  at  gene  promoters*  This 
article  examines  selected  issues  in  understanding 
activator  functions  and  activation  mechanisms. 

Introduction 

Transcription  is  the  process  of  copying  (transcribing) 
the  information  from  one  strand  of  DNA  into  RNA  by 
the  enzyme  called  RNA  polymerase  (RNAP).  In 
bacteria  there  is  only  one  RNAP,  but  in  eukaryotes  there 
are  three  different  RNAPs  that  transcribe  different 
classes  of  genes  (Hahn,  2004),  RNAPI1  is  responsible 
for  transcribing  protein-coding  genes,  whereas  RNAP  I 
and  HI  arc  responsible  for  synthesizing  rRNA  and 
tRNA  respectively.  This  article  deals  with  transcription 
by  RNAPI1  (referred  to  as  RNAP  from  now  on),  which 
has  been  subject  to  intensive  investigation  over  the  past 
decades  (Kadonaga.  2004;  Sims  et  aL T  2004b).  A  major 
focus  of  this  article  is  to  discuss  mechanisms  leading  to 
increased  levels  of  transcription,  a  process  called 
activation. 

In  addition  to  the  coding  sequence,  a  typical  class 
II  gene  contains  at  least  two  other  types  of  DNA 
sequences  that  are  required  for  initiating  transcription. 
The  first  such  elements  are  called  promoters  (also 
referred  to  as  core  promoters).  These  are  specific  DNA 


sequences  located  upstream  of  the  coding  regions  of  the 
genes.  Promoters  help  orient  RNAP  so  that  it  "knows” 
where  on  DNA  to  start  transcribing  and  in  which 
direction.  RNAP  itself  does  nul  have  the  ability  to 
recognize  specific  DNA  sequences  such  as  promoters. 
Instead,  a  group  of  proteins,  called  general  transcription 
factors  (GTFs)j  help  RNAP  to  find  promoter  sequences 
(IJampsey,  1 998;  Orphamdes  et  aL ,  1996),  One  of  these 
GTFs  is  the  TATA -box  binding  protein  (TBP),  which 
directly  binds  to  the  TATA  element  of  a  promoter.  The 
protein  complex  assembled  at  the  promoter  is  often 
referred  to  as  the  p reinitiation  complex  or  transcription 
machinery  (or  apparatus).  This  complex  contains  GTFs 
and  RNAP  It  also  contains  co- factors  and  chromatin 
modi  tying, remodeling  factors  (or  complexes)  that  are 
part  of  the  RNAP  ho Eo enzyme.  Many  of  these 
additional  factors  play  important  roles  in  mediating 
transcription  regulation  by  responding  to  regulatory 
proteins  (Levine  and  Tjian,  2003;  Malik  and  Roedcr. 
2000;  Naar  et  aL ,  200  T ;  Nnrfikar  c/  al ,  2002). 

The  second  type  of  DNA  elements  required  for 
initiating  gene  transcription  is  the  regulatory  elements, 
to  which  regulatory  proteins  bind,  Those  elements  that 
play  positive  roles  in  transcription  are  called  upstream 
activation  sequences  (UASs)  in  yeast  and  enhancers  in 
higher  eukaryotes  such  as  humans.  These  sequences 
provide  the  binding  sites  for  transcriptional  activators 
that  increase  the  levels  of  gene  transcription. 
Lnhancers  (and  proximal -promoter  elements)  play 
particularly  important  roles  in  gene  expression:  genes  in 
eukaryotic  cells  tend  to  stay  silent  (off)  unless  they  are 


For  our  discussion  in  ihis  chapter,  tve  neat  jjr>miilh| -promoter  dements, 
which  are  located  immed  lately  upstream  of  the  core  promoters;  a*  part  of  ihe 
regidatmy  dements, 
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stimulated  (turned  on)  by  activators  bound  to  enhancers. 
This  is  obvious  for  genes  that  need  to  be  specifically 
turned  on  at  precise  times  and  locations  in  response  lo 
environmental  or  developmental  signals.  This  is  even 
true  tor  housekeeping  genes  that  appear  to  be 
transcribed  at  all  times;  for  these  genes,  their 
transcription  is  also  dependent  on  activators  bound  lo 
regulatory  sequences.  Many  enhancers  are  located 
upstream  of  the  genes,  but  they  have  also  been  found  in 
inrrons  or  even  down  stream  of  die  genes.  A  special 
feature  of  enhancers  is  that  they  can  stimulate 
transcription  in  an  orientation-  anti  distance-  independent 
manner  (Blackwood  and  Kadonaga,  1 998).  There  are 
also  regulatory  elements  that  play  negative  roles  in 
transcription;  ihesc  elements  contain  binding  sites  for 
transcriptional  repressors.  This  article  primarily  deals 
with  activator  functions  and  mechanisms  of  activation. 

There  are  several  other  types  of  DNA  sequences 
that  also  play  important  roles  in  Iranscription  but  arc  not 
further  discussed  in  this  article  For  example,  the 
polyadenylalion  site  located  at  the  end  of  a  gene 
instructs  RNAP  to  terminate  transcription  (Ares  and 
Proud  foot,  2005;  Tollervey,  2004).  In  this  article  we 
will  first  discuss  how  a  typical  activator  looks  like  and 
how  it  might  activate  transcription.  We  will  then  expand 
our  discussion  by  examining  selected  issues  to  further 
explore  activator  functions  and  activation  mechanisms. 

A  Typical  Transcriptional  Activator 

A  typical  activator  has  two  essential  functions: 
DNA  binding  and  transcriptional  activation  (Plashne, 
1988).  Many  activators  have  separate  protein  domains 
lo  confer  these  two  functions.  The  DNA  binding 
domain  of  an  activator  enables  the  protein  lo  recognize 
specific  DNA  sites  located  within  enhancers.  There  are 
different  families  of  DNA  binding  domains  that  form 
distinct  structures  to  recognize  DNA  (Garvic  and 
Wolberger,  2001).  These  domains  tend  to  bear  names 
that  depict  their  structural  and/or  functional  properties 
or  follow  their  founding  member’s  names.  For  example, 
a  zinc- Unger  DNA  binding  domain  uses  zinc  to 
maintain  iLs  three-dimensional  structure  required  for 
DNA  recognition,  A  basic  region-leucine  zipper  (bZIP) 
domain  contains  a  basic  region  (that  contacts  DNA)  and 
a  leucine  zipper  (that  forms  dimers).  A  homeodomain  is 
a  conserved  60-aa  DNA  binding  domain  initially 
identified  in  proteins  encoded  by  Drosophila  homeolic 
genes,  which  play  critical  roles  in  specifying  segment 
identity.  A  Rel  homology  domain  is  a  DNA  binding 
domain  that  bears  the  name  of  its  founding  member  ReL 

Most  DNA  binding  domains,  including  all  the 


examples  mentioned  above,  recognize  short,  specific 
DNA  sequences  by  making  elaborate  contacts  with  the 
bases  in  the  major  groove  of  the  DNA  double  helix 
(Garvic  and  Wolberger,  2001;  Putikoglou  and  Burley, 
1997).  Others,  e.g.,  the  high-mob ility-group  (HMG) 
domain,  recognize  DNA  sequences  by  interacting  with 
the  minor  groove  (Travers.  2000).  While  most  DNA 
binding  domains  recognize  DNA  sites  as  dimers  (e,g.. 
b/.IP  and  Rel  family  members),  others  can  bind  as 
monomers  (c,g,,  some  homeodomain  proteins),  For 
proteins  that  bind  DNA  as  dimers,  many  can  form  both 
homodimers  and  heterodimers  with  other  family  members, 
Heterodimer  formation  can  increase  the  repertoire  of 
DNA  sequences  recognized  by  a  given  family  of 
transcription  factors.  Many  activators  can  bind  DNA 
cooperatively  with  one  another,  which  can  increase  the 
stability  of  the  protein  complexes  formed  at  die  enhancers 
(Adams  and  Workman,  1995;  Matj/  at.,  1996). 

The  acti  vation  domain  of  an  activator  plays  critical 
roles  in  stimulating  transcription.  Unlike  DNA  binding 
domains  that  require  elaborate  structures  for  DNA 
recognition,  activation  domains  Lend  LO  be  short 
sequences  often  with  very  limited  sequence  complexity. 
There  are  different  types  of  activation  domains,  which 
are  named  after  their  sequence  characteristics,  such  as 
acidic,  glutamine-rich,  pro  tine-rich,  and  alanine-rich. 
For  the  acidic  class  of  activating  sequences,  it  was 
estimated  that  1%  of  the  peptides  encoded  by  random 
DNA  sequences  (from  the  E.coli  genome)  can  activate 
transcription  when  fused  to  a  DNA  binding  domain  (Ma 
and  Plashne,  1987c).  This  finding  further  highlights  the 
“relaxed”  specificity  between  activation  sequences  and 
their  targets  (Ma,  2004),  a  feature  that  contrasts  the 
interaction  mode  between  DNA  binding  domains  and 
DNA  sites. 

One  Important  finding  in  understanding  activator 
functions  was  the  demonstration  of  the  modular  nature 
of  activators,  t.e.5  the  DNA  binding  and  activation 
functions  are  provided  by  separable  domains  (Brent  and 
Plashne,  1985;  Keegan  el  ai ,  1986),  This  finding 
suggested  that  DNA  binding  per  se  was  insufficient  fur 
activation  in  eukaryotes  (Brent,  2004;  Ptashne,  2004). 
Subsequent  demonstration  that  activation  sequences  are 
short,  simple  peptides  (Hope  and  Struhl,  1986;  Ma  and 
Ptashne,  1987b:  Ma  and  Ptashne,  1987c)  further 
supported  the  notion  that  activation  domains  achieved 
their  functions  by  touching  other  proteins  (also  see 
below  ).  T  he  demonstration  of  activators'  modular  nature 
has  also  enabled  researchers  to  determine  easily  whether  a 


TBP.  a  GTF  LliHl  car  bind  specific  DNA  sequences,  nko  makes  contacts 
with  the  minor  groove  (Kimef  at 1V93;  Nikolov  ei  at.,  19023- 


Chii  p  Ll-]  (1 8  fransc  ri  pi  i  on  al  Ac  l  i  valors  and  At:  iiv  ai  I  on  M  cc  h  an  isms 


particular  transcription  factor  has  an  activation  function, 
through  experiments  of  assaying  the  activity  of  hybrid 
proteins  containing  the  factor's  fragments  fused  to  a 
heterologous  DNA  binding  domain. 

The  Recruitment  Model 

What  does  an  activator  do  to  stimulate  transcription  • 
As  discussed  above,  an  essential  domain  of  an  activator 
is  its  DNA  binding  domain,  which  brings  the  activator 
to  DNA  sites  in  an  enhancer.  But  the  DNA  binding 
domain  itself  is  insufficient  to  activate  transcription:  an 
activation  domain  is  required  for  activation.  Activation 
domains  have  been  shown  to  have  the  ability  to  interact 
with  a  wide  array  of  proteins,  many  of  which  are 
components  of  the  transcription  machinery,  including 
the  GIFs  (e.g„  TBP,  TF11B.  TRIE,  and  TAFs).  co-factors 
and  chromatin  modifying/ remodeling  complexes  (Malik 
and  Roeder,  2000;  Naar  et  al ,  2001;  Narlikar  et  al ., 
2002;  Orphnnides  et  al..  1996:  Peterson  and  Workman. 
2000:  Ptashne  and  Gann,  1990).  All  these  (and  other) 
interactions  lead  to  a  unified  final  outcome:  increased 
level  of  transcription.  According  to  a  well-established 
recruitment  model,  the  ultimate  and  only  goal  of  these 
interactions  is  to  bring  the  transcription  machinery,  in 
particular  KNAP,  to  the  promoter  ( Plash  lie  and  Gann, 
1997;  Stargell  and  Strnhk  19%).  During  the  activation 
process,  a  DNA  loop  may  be  formed  as  a  result  of  the 
interaction  between  the  activator  bound  at  the  enhancer 
and  the  transcription  machinery  at  the  core  promoter 
(Ptashne,  1986). 

Several  lines  of  evidence  support  the  recruitment 
model  First,  for  many  genes,  die  GTFs  and  KNAP  are 
absent  from  their  promoters  unless  the  genes  are  turned 
on  by  activators  (Chatterjee  and  Strnhl.  1995:  Klein  and 
S tiuh L  1994;  Li  et  ai.  1999).  Second,  activators  are 
known  to  interact  with  components  of  the  transcription 
machinery:  as  noted  above,  one  property  common  to  the 
activation  domains  is  that  they  tend  to  have  the  ability 
to  interact  with  multiple  target  proteins  (Bryant  and 
Ptashne,  2003;  Ma,  2004;  Ptashne  and  Gann,  1990), 
Finally,  in  a  set  of  “artificial  recruitment1'  experiments 
that  provided  pivotal  support  to  this  model  it  was 
shown  that  n inscription  can  be  elicited  by  artificially 
attaching  components  of  the  transcription  machinery  to 
a  DNA  binding  domain  (Chatterjee  and  Struhl,  1995; 
Farrell  et  al.,  1996;  Gonzalez-Gouto  et  ai,  1997;  Ncvado 
et  al ,  1999:  Xiao  et  al ,  1995).  In  these  artificial 
recruitment  experiments,  the  requirement  of  a  classical 
activator  is  bypassed,  i.e..  the  activator  is  no  longer 
needed  for  transcription.  This  suggested  that,  at  least  for 
the  promoters  tested,  all  the  functions  that  are  provided 
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by  the  activators  could  be  substituted  by  physically 
bringing  the  RNAP  holoenzyme  to  the  promoter.  It 
should  be  noted  that,  since  the  eukaryotic  DNA  is 
wrapped  in  nucleosomes,  the  recruitment  model  may 
also  cover  situations  in  which  the  chromatin  modifying 
or  remodeling  factors  recruited  by  activators  facilitate 
die  assembly  of  the  transcription  machinery  by  increasing 
the  accessibility  of  promoter  DNA.  As  discussed  below, 
i he  recruitment  model,  though  attractive  doc  to  its 
simplicity  and  experimental  support,  docs  not  exclude 
other  possibilities  of  how  activators  may  stimulate 
transcript  ion. 

Now  with  this  broad  description  of  what  a  typical 
activator  looks  like  and  how  it  may  activation  transcription 
according  to  one  model,  we  will  discuss  several  additional 
issues  to  further  our  understanding  of  activator  functions 
and  activation  mechanisms.  Readers  should  refer  to 
other  chapters  in  this  volume  that  discuss  specific 
examples  of  activators  in  greater  details. 

Co  m p  o  site  A  ct  i va  tors 

Although  a  typical  activator  contains  both  an 
activation  domain  and  a  DNA  binding  domain,  sometimes 
these  two  domains  can  reside  on  separate  proteins.  For 
example,  the  herpes  simplex  virus  (HSV)  activator 
VP 1 6  dos  not  bind  to  DNA,  but  rather,  it  is  brought  to 
DNA  by  interacting  with  other  DNA-binding  proteins 
(Triezenbcrg  et  ai ,  1988).  The  activation  domain  of 
VP  1 6  can  also  activate  transcription  when  directly 
linked  to  a  DNA  binding  domain  (Sadowski  et  ®t.f 
1988),  This  finding  demonstrated  that  an  activation 
domain  can  be  brought  to  DNA  by  distinct,  but 
interchangeable,  means,  either  directly  binding  to  DNA 
(through  ils  linked  DNA  binding  domain)  or  interacting 
with  other  DNA-binding  proteins.  This  concept  was 
further  demonstrated  by  the  creation  of  an  artificial 
composite  activator  (Ma  and  Ptashne,  1988),  I  he  yeast 
repressor  protein  GAL80  inhibits  the  activation  function 
of  GAL4  by  interacting  with  and  masking  its  activation 
domain  (Johnston  et  ai,  1987:  Lue  et  ai,  1987:  Ma  and 
Ptashne,  1987a).  When  GAL80  was  attached  to  an 
activation  domain,  the  hybrid  GAL80  protein,  which 
itself  cannol  bind  DNA,  gained  an  ability  to  activate 
transcription,  but  only  through  a  GAL4  derivative  that 
could  interact  with  both  DNA  and  G A  1,80  (Ma  and 
Ptashne,  1988).  The  concept  that  an  activation  domain 
can  be  brought  to  DNA  through  protein- protein 
interactions  led  to  the  proposal  of  the  yeast  two-hybrid 
system  {Fields  and  Song,  1989),  This  powerful  genetic 
system  has  allowed  researchers  to  dissect  protein- 
protein  interactions  and  to  identity  proteins'  interacting 
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partners  (Bni  and  E I  ledge.  1996;  Fields  and  Sternglanz. 
1994;  Ma,  2000). 

Transcriptional  activators  that  do  not  bind  DNA  but 
interact  with  other  DNA-binding  proteins  are  sometimes 
also  referred  to  as  co-activators.  But  it  may  be  useful  to 
make  a  distinction  between  these  non -DNA  binding 
activators  and  the  "true1'  co-activators  that  play  more 
general  roles  in  transcription.  Unlike  non-DNA  binding 
activators.  which  are  gene-specific,  co- activators  of  the 
latter  class  (e  g.*  CBP  and  Swi-Snf  complexes)  play 
important  roles  in  facilitating  the  actions  of  many 
activators-  Some  of  these  general  co-activators  are 
components  of  the  KNAP  holoenzyme  (Myers  and 
Korn  berg,  2000:  Ranish  and  Hahn*  1996). 

The  concept  that  an  activation  domain  can  be 
brought  to  its  action  site,  the  vicinity  of  a  gene's 
promoter,  through  multiple  means  can  be  further 
extended  to  activators  that  bind  to  RNA  sequences.  One 
such  example  is  the  HIV  activator  Tat,  which  is 
discussed  in  further  detail  in  another  chapter  of  this 
volume.  Relevant  to  this  discussion  is  the  finding  that 
the  RNA-binding  act i valor  Tat  can  also  activate 
transcription  from  DNA  sites  when  fused  to  a  DNA 
binding  domain  (Soulhgate  and  Green.  1991),  further 
illustrating  that  an  activation  domain  can  be  brought  to 
the  vicinity  of  a  promoter  through  distinct,  but 
i  n  tore  ha  n  gea  b  I  e ,  me  c  h  an  i  s m s. 

Conformational  Changes 

The  artificial  recruitment  experiments  mentioned 
above  support  the  notion  that  the  ultimate  and  only 
fund  ion  of  ae  Li  valors  is  to  bring  RNAP  to  the  promoter. 
It  is  known  that  the  prein ilia  lion  complex  undergoes 
several  conformational  changes  before  RNAP  actually 
initiates  transcription  (Carey  and  Smale,  2000).  For 
example,  the  promoter  DNA  is  significantly  bent  and 
unwound  upon  TBP  binding  (Kim  et  ai.,  1993;  Nikoiov 
et  aL ,  1992).  In  addition,  the  DNA  double  helix  at  the 
transcription  start  site  becomes  unpaired,  or  melted,  to 
form  a  bubble  prior  to  transcription  initiation  by  RNAP 
(Giardjna  and  Lis,  1993;  Wang  et  aL .  1992),  In  one 
study,  it  was  shown  that  activators  can  change  the 
conformation  of  the  TFIIA-TFI1D-TATA  complex  and 
such  a  conformational  change  is  necessary  and  sufficient 
for  activation  in  an  in  vitro  system  (Chi  and  Carey, 
1996).  Thus,  conformational  changes  of  the  transcription 
machinery  represent  potential  steps  that  can  also  be 
targeted  by  transcriptional  activators. 


Initiation  vs.  Elongation 

Although  transcription  initiation  is  a  critical  step 
that  can  be  stimulated  by  many  activators,  other  steps  of 
transcription,  such  as  elongation,  can  also  be  activated. 
For  example,  the  Drosophila  heat  shock  gene  hsp70 
already  has  the  transcription  machinery  loaded  at  its 
promoter  even  before  heat  shock  (induction)  (Rougvie 
and  Lis,  1988).  In  fact.  RNAP  is  able  to  transcribe  the  5' 
region  of  the  gene  prior  to  induction,  but  it  tails  to 
transcribe  through  the  gene  (Rasmussen  and  Lis,  1993; 
Rasmussen  and  I  is,  1995).  Upon  induction,  the 
transcriptional  activator  MSP  stimulates  elongation  by 
RNAP.  enabling  it  to  complete  transcription  through  the 
gene. 

Many  proteins  (or  complexes)  have  been  identified 
that  play  important  roles  in  facilitating  transcription 
elongation,  and  some  of  these  factors  represent  targets 
for  activators  (Sims  et  a 2004a)  For  example, 
experiments  in  Drosophila  suggested  that  the  elongation 
fuel  or  I1  IT  lb  is  recruited  (likely  by  Lhc  activator  HSF) 
to  the  heat  shock  loci  to  facilitate  transcription 
elongation  upon  heal  shock  induction  (Lis  et  aL,  2000). 
In  addition,  in  vitro  experiments  using  the  human  hsp70 
gene  demonstrated  that  the  Swi-Snf  complex  was 
recruited  by  the  human  activator  HSF  I  to  facilitate 
transcription  elongation  through  the  chromatin  template 
of  the  gene  (Brown  et  ai..  1996).  As  discussed  elsewhere 
in  this  volume,  die  HIV  Tat  activator  stimulates 
transcription  elongation  by  recruiting  the  elongation 
factor  P-TliFb  (MancebotV  aL,  1997;  Zhou  el  aL.  1998: 
Zhu  et  aL ,  1997).  Together,  these  examples  highlight  the 
importance  of  the  elongation  step  in  transcriptional 
activation. 

In  this  context,  it  should  be  noted  that  recent 
studies  in  both  yeast  and  Drosophila  suggest  that 
transcription  silencing  can  work  at  a  step  after  the 
assembly  of  the  I  ran  scrip  Li  on  machinery  (Broiling  ef  aL, 
2001:  Del  lino  et  aL.  2004;  Sckmgcr  and  Gross.  2001), 
In  other  words,  the  mere  presence  of  the  transcription 
machinery  (including  the  RNAP  itscll)  assembled  at  the 
promoter  does  noi  necessarily  equate  Lo  productive 
transcription  of  the  gene. 

Synergism 

One  of  the  characteristic  features  of  transcriptional 
activation  is  synergism.  Synergy  refers  to  the  situation 
where  the  transcription  level  achieved  by  multiple 
activators  is  higher  than  the  sum  of  the  levels  by 
individual  factors  separately.  Synergy  can  arise  from 
different  mechanisms.  In  the  simplest  case,  it  can  be  due 
to  cooperative  binding  of  activators  to  multiple  sites  in 
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the  enhancer.  Fhis  is  obvious  particularly  if  the  activators 
aic  al  limiting  (sub-saturating)  concentrations.  An 
enhanceosome  model  lias  been  proposed  that  further 
emphasizes  the  role  of  multiple  activators  for  activation 
(Merika  and  lhanos,  2001;  Thanos  and  Maniatis,  1 90s ) 
According  to  this  model,  different  activators,  including 
those  that  play  architectural  roles,  arc  together  required 
to  form  a  stable  coni  pies  at  the  enhancer  for  efficient 
transcriptional  activation. 

Synergy  can  also  be  achieved  even  when  activators 
are  at  saturating  levels.  This  particular  form  of  synergy, 
Which  can  be  demonstrated  readily  under  in  vitro 
conditions,  provides  useful  insights  into  how  activators 
work.  Ir  suggests  that  activators  can  contact  multiple 
targets  in  the  transcription  machinery  (Ptashne  and 
Gann,  1998),  If  all  activator  molecules  contacted  the 
same  target  through  the  same  surface,  then  increasing 
(he  number  of  activator  molecules  should  only  increase 
transcription  level  in  an  additive,  rather  than  synergistic, 
manner. 

Studies  lo  Compare  the  roles  of  different  activators 
suggest  that  synergy  may  represent  a  consequence  of 
combinatorial  actions  of  activators  that  work  on  distinci 
steps  of  transcription  (Blau  et  at.,  1996).  By  comparing 
tlie  RNAP  density  along  a  gene,  it  is  possible  to  gain 
information  about  which  step,  initiation  or  elongation, 
an  activator  may  stimulate.  Using  this  and' other 
analyses,  Blau  et  al  concluded  dial,  while  some 
activators  (e.g..  Spl  and  CTF)  work  primarily  on  die 
initiation  step,  others  (e.g..  Tat)  work  primarily  on  the 
elongation  step  (Blau  et  al,  1996).  Another  class  of 
activators  (e.g.,  VP16.  p53  and  F2F1)  can  work  on  both 
initiation  and  elongation  steps.  An  analysis  of  these 
activators  revealed  that  synergy  was  only  achieved 
between  those  that  work  on  different  steps  of 
transcription  ( Blau  et  al. .  1 996), 

Interplay  Between  DNA  Binding  and  Activation 
Functions 

It  is  well  established  that,  in  general,  the  DNA 
binding  and  transcriptional  activation  domains  of  an 
activator  are  physically  separable.  It  should  be 
emphasized,  however,  that  ihe  functions  provided  by 
these  two  domains  are  interconnected.  First,  an 
activation  domain  cannot  exert  its  activating  effect 
unless  it  is  brought  to  DNA,  cither  through  a  physical 
link  to  a  DNA  binding  domain  or  through  other  means 
such  as  interacting  with  another  DNA-binding  protein. 
Second,  some  activators,  e.g..  the  glucocorticoid 
receptor  and  MyoD,  have  DNA  binding  and  activation 
functions  conferred  by  single  protein  domains  (Davis  et 
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al.  1990;  Scliena  et  al.,  1989).  Furthermore,  several 
Studies  have  suggested  that  tile  DNA  binding  properties 
of  an  activator  can  be  influenced  by  its  activation 
function  (Bunker  and  Kingston.  1996;  Tanaka,  1996).  In 
particular,  it  was  observed  that  activators  that  stimulated 
transcription  Strongly  bound  DNA  better  than  those  that 
activated  weakly.  It  has  been  proposed  that  the 
interaction  between  the  activation  domain  and  the 
transcription  machinery  can  help  the  binding  of  not  only 
the  transcription  machinery  to  the  promoter  but  also  the 
activator  fo  Us  DNA  sites.  For  activators  that  work  by 
recruiting  chromatin  remodeling  or  modifying  complexes, 
the  increased  DNA  accessibility  is  beneficial  to  not  only 
G  IFs  but  also  activators  themselves.  The  connection 
between  the  activation  and  DNA  binding  functions  may 
also  contribute  to  how  activator  gradients,  such  as 
Drosophila  Bicoid,  stimulate  transcription  in  a 
concentration-dependent  manner  (Dricver  et  al.  1989; 
Fu  et  al.  2003;  Zhao  et  al.  2003;  Zhao  et  al.  2002). 

Activation  vs.  De-repression 

One  of  the  major  differences  between  eukaryotes 
am]  prokaryotes  is  that  eukaryotic  genomes  are 
packaged  into  nucleosomes.  Nucleosomes  can  impede 
DNA  binding  of  transcription  factors  and  G'l'Fs,  thus 
repressing  transcription  (Narlikar  et  al.  2002;  Peterson 
and  Workman,  2000;  Wu.  1997;  Wu  and  Grunstein, 
2000).  One  of  the  questions  regarding  mechanisms  of 
eukaryotic  transcription  activation  is  how  much  is  due 
to  do- repress  ion  and  how  much  is  due  to  “real” 
activation.  It  is  wetl  established  that  genes  can  be 
^-repressed  when  histones  arc  depleted  from  cells 
(Han  and  Grunstein,  1988).  To  further  obtain  insights 
into  the  role  ofhistones  in  gene  activation,  Wyrick  et  al 
used  the  microarray  strategy  to  determine  the  profiles  of 
gene  expression  upon  histone  H4  depletion  (Wyrick  et 
al.  1999).  The  authors  found  that  15%  of  the  ttenes 
exhibited  de-repressed  (increased)  expression  in  response 
to  the  removal  of  histone  H4.  Genes  that  are  located 
near  telomeres  tend  to  be  more  sensitive  lo  histone 
depletion  than  genes  located  elsewhere.  These  results 
show  that  depletion  of  histone  can  lead  to  gene-specific 
de-repression.  The  authors  also  found  that  histones  did 
not  play  a  generally  repressive  role  For  al!  genes,  since 
the  majority  (75%)  of  genes  appeared  to  be  insensitive 
to  histone  depletion.  Interestingly,  10%  of  the  genes  in 
yeast  had  reduced  expression  upon  histone  depletion, 
suggesting  that  histones  and  nucleosomes  may  also  play 
positive  roles  in  transcription  (Wyrick  et  al. ,  1 999). 
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Activation  and  Cellular  Memory 

In  some  cases  the  e  flee  I  of  transcriptional  activators 
can  he  maintained  oi  inherited  even  after  the  activators 
themselves  are  no  longer  p  re  send  One  such  example  is 
homed iu  gene  expression  in  Drosophila  (Levine  et  al,. 
2004:  Orlando.  2003).  During  early  embryogenesis  Ihe 
homeotic  genes  respond  k>  transcription  factors  that  are 
encoded  by  gap  and  pair  rule  genes.  The  active  and 
silent  states  of  these  genes  are  subsequently  maintained 
by  proteins  encoded  by  the  inthorax  group  (trxG)  and 
Poly  comb  group  (PcG)  genes,  respectively.  trxG  and 
PcG  proteins  form  co-factor  complexes  that  work  through 
DNA  elements  called  Polycom b  response  elements 
( PREs).  In  ail  elegant  study,  an  isolated  PRE.  Fab-7 , 
was  shown  to  be  able  to  maintain  the  active  stale  of  a 
linked  reporter  construct  that  had  been  activated  by  a 
transiently  expressed  activator  (Ca vail i  and  Paro,  1998). 
In  oilier  words,  the  reporter  gene  remained  on  even  after 
die  activator  itself  was  no  longer  present.  Intriguing!  y, 
such  memory  can  be  transmitted  in  an  activator- 
independent  manner  to  subsequent  generations  through 
female  (but  not  male)  germ  line.  Recent  studies  show 
that  components  in  both  PcG  and  trxG  complexes 
contain  histone  methyl  transferase  (1 TMT)  activities  with 
different  spec i tic i t i es/p rc fcrenc es  for  different  lysine 
residues  in  histone  tails  (Levine  et  aL.  2004;  Lund  and 
van  Lohuizen,  2004:  Sims  et  aL,  2003).  It  is  thought 
that  distinct  histone  methylation  patterns  represent  cell 
memory  systems  to  maintain  the  active  and  silent  states 
of  homeotic  genes  (Orlando,  2003;  Sims  et  aL ,  2003). 

In  yeast,  genes  dun  are  transcribed  recently  are 
also  marked  by  a  specific  pattern  of  histone  methylation 
(Hampscy  and  Rein  berg,  2003),  This  is  achieved  by  the 
recruitment  of  the  HMT  Sell  to  the  genes  by  the 
elongating  RNAP  (Ng  et  aL,  2003).  Interestingly,  I  he 
Sell -mediated  histone  methylation  pattern  persists  for 
some  time  even  after  the  genes  are  no  longer  transcribed. 
Unlike  the  long-term  memory  of  active  genes  mediated 
by  trxG  proteins  In  Drosophila  (which  can  Iasi  for 
several  generations).  Sell -mediated  marking  of  recently 
active  genes  in  yeast  is  only  short  term  (up  to  several 
hours).  In  addition,  while  die  consequence  of  the 
trxG  mediated  marking  is  to  maintain  the  genes  on,  the 
yeast  Set  1 -mediated  system  only  marks  the  recently 
transcribed  genes  without  actually  keeping  them  on. 
Interestingly,  yeast  Sell  is  also  involved  in  the 
long-lerm  memory  of  gene  silencing  (Bryk  et  a/,,  2002; 
Krogan  et  al. ,  2002). 

Another  case  of  activator- induced  memory  is 
noteworthy  in  this  context.  This  is  an  extremely  short¬ 
term  memory,  which  lasts  only  through  ilie  initial 


activation  process  itself.  Under  some  in  vim  conditions, 
transcriptional  activators  can  induce  conformational 
changes  of  the  pre initiation  complex.  Interestingly,  one 
study  demonstrated  that  an  activator- induced 
conformational  change  persisted,  and  led  to  the 
completion  of  the  transcription  process,  even  after  the 
activator  itself  was  removed  (Chi  and  Carey,  1996). 
This  result  further  illustrates  the  importance  of 
conformational  changes  in  Lransci Optional  activation. 

Activator-repressor  Switches 

Although  this  chapter  deals  primarily  with  activators 
and  activation  mechanisms,  it  should  be  noted  that 
many  transcription  factors  can  often  work  as  either 
activators  or  repressors  in  a  con  Lexl  dependent  manner 
(Ma,  2005).  For  example,  many  transcription  factors 
that  mediate  signal  transduction  processes  work  as 
repressors  in  the  absence  of  the  signals  but  as  activators 
in  the  presence  of  the  signals.  In  addition,  the 
concentrations  and  posttranslational  modifications  of  a 
transcription  factor  can  affect  its  ability  to  either 
activate  or  repress  transcription.  The  presence  of  other 
nearby  DNA  binding  proteins  on  DNA.  as  well  as  the 
availability  and  concentration  of  co- Factors,  can  also 
influence  the  behavior  of  a  transcription  factor.  See  a 
recent  review  article  for  further  details  ( Mu,  2005). 

Short  Distance  and  Long  Distance  Actions 

One  of  the  questions  in  eukaryotic  gene  activation 
concerns  actions  at  long  distances.  Enhancers  in  higher 
eukaryotes  have  the  ability  to  exed  their  effects  even 
when  they  are  located  many  kilobases  away  1’rom  die 
promoters  (Blackwood  and  Kadonaga,  1998).  There  are 
no  specific  definitions  of  short  distance  vs.  long 
distance,  hilt  for  our  discussion  we  can  consider  short 
distance  as  anything  up  to  a  few  hundred  base  pairs  and 
long  distance  greater  than  one  kilohase  (Blackwood  and 
Kadonaga.  1998;  Dorse  rt,  1999).  The  mechanisms  for 
activation  at  short  or  long  distances  may  be  fundamentally 
similar  in  that  they  are  both  achieved  through  a  network 
of  protein- protein  interactions  and  alterations  of  chromatin 
structure.  But  activation  al  a  long  distance  (e.g.,  50-60 
kilobases)  faces  two  additional  challenges  that  are  less 
relevant  to  activation  al  a  short  distance  (e  g,,  100-200  bp). 
First,  how  can  promoters  and  enhancers  communicate 
through  such  long  distances?  Second,  how  does  an 
enhancer  “choose”  to  activate  one  promoter,  but  not 
another  one  that  is  also  within  its  reach? 

Proteins  called  facilitators  have  been  proposed  to 
promote  the  interaction  between  enhancers  and  promoters 
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that  arc  separated  by  long  distances  (Bulger  and 
Greudine,  1999:  Dorse U,  1999).  One  such  example  is  a 
Di  mopfi  it a  p  rote  in  called  Chip  ( M  o  re  i  1  Lo  at  ul.y  1997; 
I'origoi  at  at..  2000),  It  is  thought  Chip  can  interact  with 
proteins  that  may  bind  throughout  the  genome,  such  as 
homeodomain  proteins,  thus  bringing  enhancers  closer 
to  promoters  through  Lhe  formation  of  a  series  of  loops 
( Dorsett,  1999), 

Recent  studies  suggest  that  the  efficiency  (and 
specificity)  of  the  communication  between  enhancers 
and  promoters  can  also  he  augmented  by  DNA  dements 
located  near  the  core  promoters.  These  elements  have 
been  called  tethering  elements  (Bertolino  and  Singh. 
2002;  Calhoun  at  ai ,  2002).  In  one  study,  it  was  shown 
chat  the  POU  domain  of  Oct- 1  bound  to  DNA  sites  near 
a  promoter  enables  the  promoter  to  respond  lo  a  distant 
enhancer  (Bertolino  and  Singh,  2002).  Interestingly,  the 
POU  domain  itself  does  not  work  as  a  classical  activator 
because  it  cannot  activate  transcription.  It  was  suggested 
that  the  POU  domain  of  Oct- 1  recruits  the  TFUD 
complex  to  the  promoter,  so  that  the  promoter  becomes 
poised  for  activation  by  an  enhancer  at  a  distance 
(Bertolino  and  Singh,  2002). 

The  specificity  of  long-distance  communication 
between  enhancers  and  promoters  can  be  regulated  by 
different  mechanisms  (Blackwood  and  Kadonaga.  1 998), 
First,  the  tethering  elements  mentioned  above  can 
selectively  facilitate  the  communication  between  a 
promoter  and  one,  but  not  another,  enhancer  (Calhoun 
et  ain  2002),  Second,  in  some  cases  promoters  can 
compete  with  each  other  for  an  enhancer,  thus  the 
enhancer  preferentially  communicates  with  the  strong 
promoter,  while  ignoring  die  weak  promoter  (Foley  and 
Engel.  1992;  Sharpe  et  ai. ,  1998).  Finally,  insulator 
elements  can  prevent  ^unwanted”  communications 
between  enhancers  and  promoters  thus  encouraging 
"wanted"  interactions;  an  insulator  is  a  DNA  element 
that  can  block  the  communication  between  an  enhancer 
(or  a  silencer)  and  a  promoter  when  the  insulator  is 
located  between  them,  but  not  when  il  is  located  outside 
the  enhancer-promoter  unit  (Kuhn  and  Ucycr.  2003; 
West  et  at ,  2002 ). 

Recent  studies  reveal  that  the  Drosophila  genome 
contains  organized  domains  some  as  large  as  200 
k debases  that  contain  many  genes  with  similar 
expression  profiles  (Spellman  and  Rubin,  2002).  llow 
genes  within  these  large  domains  arc  coordinate ly 
regulated  is  currently  unclear*  1 1  is  proposed  that  cadi  of 
these  domains  may  contain  some  higher  order  control 
dements  (Calhoun  and  Levine.  2003).  such  as  the 
recently  discovered  global  control  region  (GGR)  for  lhe 
mouse  Ho.xD  complex  (Spitz  or  ai,  2003;  Zuniga  at  ai , 


2004).  In  this  context  it  is  noted  that  UASs  in  yeast 
generally  do  not  work  at  distances  greater  than  several 
hundred  base  pairs  (also  sec  de  Bruin  et  at..  2001).  It  is 
evident  metazoans  have  evolved  mechanisms  lo  facilitate 
long-distance  enhancer- promoter  communications  and 
accommodate  the  increased  complexity  of  gene  regulation. 

Modifications  of  Activators 

Many  activators  are  subject  to  posUranslalional 
modifications,  such  as  phosphorylation  (Brivanlou  and 
Darnell,  2002),  acetylation  (Brooks  and  Gu,  2003),  and 
glycosylation  (Jackson  and  Tjian,  1988;  Kamemura  and 
Hart,  2003),  In  many  cases  the  modifications  can  have 
positive  roles  in  transcriptional  activation.  For  example, 
phosphorylation  of  S  I  AT  is  responsible  for  mediating 
the  J AKSTAT  signal  transduction  pathway  (Brivanlou 
and  Daniel k  2002;  Darnell  at  ai.  1994),  Acetylation  of 
p53  can  increase  its  ability  to  bind  DNA  (Brooks  and 
Gu,  2003;  Gu  and  Roeder,  1997;  P  rives  and  Manley. 
2001),  Recent  studies  suggest  that  ubiqui  limit  ion  and 
SUMGylaiion  also  play  important  rotes  in  regulating 
the  activity  of  transcription  factors.  In  several  cases,  the 
transcriptional  activation  functions  of  activators  are 
dependent  on  ubiquitination.  Due  to  space  limitation, 
readers  should  refer  to  several  recent  review  an  ides  on 
this  topic  for  further  details  (Conaway  et  at.,  2002; 
Freiman  and  Tjian,  2003;  Gill,  2004;  Herrera  and 
Triezenberg,  2004). 

Activators  with  In/ymatk  Activities 

Although  eukaryotic  activators  themselves  generally 
contain  nci  enzymatic  activities,  recent  studies  challenge 
this  generalization.  A  group  of  activators,  which  belong 
to  Lhe  family  of  Eyes  absent  (Eya),  play  important  rotes 
in  the  development  of  multiple  tissues  and  organs 
including  the  eye,  kidney  and  muscle  (Epstein  and  Ned, 
2003;  Rebay  et  at .,  2005),  Recent  studies  show  that  Eya 
proteins,  which  are  non-DNA  binding  activators, 
contain  phosphatase  activities  (Li  et  at..  2003; 
Rayupurcddi  at  ai ,  2003;  Tootle  et  ai,*  2003),  It  is 
currently  not  fully  understood  what  substrates  these 
phosphatases  work  on  and  how  they  specifically 
contribute  to  the  transcription  activation  process. 
Numerous  co-factors  and  components  in  the  transcription 
machinery  contain  various  enzymatic  activities  that  play 
critical  roles  in  transcription  regulation  (Shi  and  Shi. 
2004;  Sims  et  ai ,  2004b).  Therefore,  the  presence  of 
enzymatic  activities  in  activators  may  not  fundamentally 
change  our  way  of  thinking  about  transcription. 
Nevertheless,  the  identification  of  enzyme-containing 
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activators  establishes  a  new  paradigm  of  increased 
complexity  in  transcription  regulation. 

Concluding  Remarks 

l  would  like  to  end  our  discussion  by  returning  to 
the  issue  introduced  ai  (he  beginning  of  this  chapter,  i,e,* 
a  typical  activator  contains  two  important  functions. 
DMA  binding  and  activation.  Why*  then,  do  activators 
have  to  bind  DNA.  or  for  non-DNA  binding  activators, 
interact  with  other  DNA-binding  proteins?  This  question 
touches  the  very  heart  of  the  activation  process.  DNA 
binding  brings  an  activator  closer  to  the  promoter  its 
action  site,  thus  effectively  increasing  its  local 
concentration  for  the  promoter.  Tins  in  turn  leads  to 
more  efficient,  localized  interactions  between  the 
activator  bound  at  the  enhancer  and  the  transcription 
machinery  bound  at  the  promoter.  According  to  the 
recruitment  model,  such  localized  interactions  help 
recruit  the  transcription  machinery  to  the  promoter. 
Acti  vator-rcc  rui  ted  chromatin  remodel  ing/modifymg 
complexes  also  exert  greater,  local  effects  on  DNA 
accessibility'  {than  untargeted  complexes  do)  to  facilitate 
the  assembly  of  the  transcription  machinery  at  the 
promoter.  For  genes  that  are  activated  through  other 
mechanisms,  e.g.,  elongation,  the  rate-limiting  steps 
also  respond  to  local  stimulation  more  favorably  than 
untargeted  signals. 

The  relatively  weak  interactions  between  activators 
and  components  of  the  transcription  machinery  represent 
a  critical  means  to  achieve  specificity  in  activation:  these 
interactions  (or  their  effects)  may  only  occur  efficiently 
when  the  enhancer  (to  which  activators  bind)  and  the 
promoter  (to  which  the  transcription  machinery  binds) 
are  linked.  As  discussed  already,  there  are  mechanisms 
that  can  facilitate  long-distance  communications  between 
enhancers  and  promoters.  Interestingly,  some  of  these 
and/or  additional  mechanisms  may  also  play  roles  in 
facilitating  a  rare  class  of  communications  that  occur 
between  enhancers  and  promoters  on  separate 
chromosomes  (Dorsctt,  1999:  Duncan,  2002:  Muller 
and  Schaffher,  1990).  U  should  be  reminded  dial 
transcription  regulation  is  not  restricted  to  activation. 
Genes  are  also  subject  to  repression  and  silencing. 
Similar  to  activation,  repression  is  also  facilitated  by 
targeted*  local  (or  regional)  interactions,  only  to  achieve 
the  opposite  outcome  of  reducing  transcription  levels. 
Understanding  gene  regulation  requires  considerations 
of  the  integration  of  transcriptional  activation  and 
repression. 
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